The POI API is full of acronyms, the meanings of which is not understandable (e.g. HSSF, XSSF, SXSSF) until you read the documentation. So let's start with some definitions:
- HSSF(Horrible SpreadSheet Format) is the POI Project's pure Java implementation of the Excel '97(-2007) file format.
- XSSF (XML SpreadSheet Format) is the POI Project's pure Java implementation of the Excel 2007 OOXML (.xlsx) file format.
- SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. It's built on top of XSSF. Available only since 3.8-beta3.
- UserModel
- UserEventModel.
Figure 1. Spreadsheet API Feature Summary |
So at first you need to download the library from official site or to configure maven dependencies. I prefer the second option, so here are dependencies (I got them from mvnrepository.com):
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.5-FINAL</version> <type>jar</type> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-contrib</artifactId> <version>3.5-FINAL</version> <type>jar</type> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.5-FINAL</version> <type>jar</type> <scope>compile</scope> </dependency>
There are two ways to read data in usermodel:
- Read only data and no null value. It usually produces ragged arrays(depends on your data)
- Read all data (inclusive nulls)
public List<List<String>> parseSpreadSheet(InputStream inputStream) { Workbook workBook = null; try { workBook = WorkbookFactory.create(inputStream); } catch (Exception e) { throw new RuntimeException(e); } Sheet sheet = workBook.getSheetAt(0); List<List<String>> rowHolder = new ArrayList<List<String>>(); int cellNum = sheet.getRow(1).getLastCellNum(); for (int i = 0; i <= sheet.getLastRowNum(); i++) { Row row = sheet.getRow(i); List<String> cellHolder = new ArrayList<String>(); for (int j = 0; j < row.getLastCellNum(); j++) { Cell cell = row.getCell(j); String cellValue = parseCellValue(workBook, cell); cellHolder.add(cellValue); } //add empty cells to the end if required while (cellHolder.size() < cellNum) { cellHolder.add(null); } rowHolder.add(cellHolder); } return rowHolder; }
The return type of the method is Java SpreadSheet representation (selected by me). The argument of the method is InputStream of the SpreadSheet file.
I have decided to use Workbook(represents Excel file), which is the parent class of HSSFWorkbook and XSSFWorkbook. WorkbookFactory frees you from necessity to write conditions for different kinds of files (xls and xlsx), it creates appropriate descendant class from InputStream. The rest of the code is simple(The code of parseCellValue() method I will show later).
We need the method to parse cell values:
private String parseCellValue(Workbook workBook, Cell cell) { FormulaEvaluator evaluator = workBook.getCreationHelper().createFormulaEvaluator(); String cellValue = null; if (cell != null) { switch (cell.getCellType()) { case Cell.CELL_TYPE_STRING: cellValue = cell.getRichStringCellValue().getString(); break; case Cell.CELL_TYPE_NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { cellValue = cell.getDateCellValue().toString(); } else { cellValue = new Double(cell.getNumericCellValue()).toString(); } break; case Cell.CELL_TYPE_BOOLEAN: cellValue = new Boolean(cell.getBooleanCellValue()).toString(); break; case Cell.CELL_TYPE_FORMULA: cellValue = evaluator.evaluate(cell).formatAsString(); break; } } return cellValue; }
As you can see we should parse cell value according to its Excel format. Note POI UserAPI is pretty powerful it even allows us to evaluate formula values.
And finally here's the way you can read all data but null values. This type of reading is based upon iterators of Row and Cell:
public List<List<String>> readSpreadSheetWOnull(InputStream inputStream) { Workbook workBook = null; try { workBook = WorkbookFactory.create(inputStream); Sheet sheet = workBook.getSheetAt(SHEET_NUMBER); Iterator<Row> rowIter = sheet.rowIterator(); List<List<String>> rowHolder = new ArrayList<List<String>>(); while (rowIter.hasNext()) { Row row = (Row) rowIter.next(); Iterator<Cell> cellIter = row.cellIterator(); List<String> cellHolder = new ArrayList<String>(); while (cellIter.hasNext()) { Cell cell = (Cell) cellIter.next(); String cellValue = parseCellValue(workBook, cell); cellHolder.add(cellValue); } rowHolder.add(cellHolder); } return rowHolder; } catch (Exception e) { throw new RuntimeException(e); } }
That's all. Later I will write a post about UserEventModel