converting excel to XSSFWorkbook using Apache poi taking long time

Question

I'm trying to convert an excel file to XSSFWorkbook , I have around 7000 rows and around 145 columns. It's taking around 15 mins to convert that excel file to XSSFWorkbook at line number 2 , in my code below :-

InputStream fs = new FileInputStream(filename);   // (1)
XSSFWorkbook wb = new XSSFWorkbook(fs);           // (2)
XSSFSheet sheet = wb.getSheetAt(0);

Instead of adding 7000 rows to XFFSWorkbook , I just want to add 30 rows to XFFSWorkbook while conversion at line number 2 ?

If not, how can I reduce the amount of time taken to convert excel to XSSFWorkbook ?

Please correct your question. Somewhere you said 7000 rows and other 7000 coloumns — niks, Sep 09 '17 at 14:31
To increase speed pass in a File and not a stream. See this: https://stackoverflow.com/a/17542608/338249?stw=2 — rjdkolb, Sep 10 '17 at 06:00
Thanks rjdkolb it helped a bit , Can you let me know how to convert StreamingReader to Workbook (Apache poi) — Peter Osta, Sep 10 '17 at 12:08
@Peter Osta:Only from curiosity: What exactly "helped a bit"? @rjdkolb suggests using a `File` instead of a `FileInputStream` while opening the `Workbook`. This "helped a bit"? Or had using [Excel Streaming Reader](https://github.com/monitorjbl/excel-streaming-reader) helped a bit? If the latter: Please read [Important! Read first!](https://github.com/monitorjbl/excel-streaming-reader#important-read-first) ff. carefully. — Axel Richter, Sep 11 '17 at 07:58

Axel Richter · Answer 1 · 2017-09-10T06:32:05.470

A *.xlsx file is a ZIP archive containing the data of the Excel in a directory structure having different XML files.

For example there are

/xl/workbook.xml describing the basic workbook structure,
/xl/worksheets/sheet1.xml, /xl/worksheets/sheet2.xml, ... /xl/worksheets/sheetN.xml having the sheet data - Here are the rows and the cells but not all data within the cells are directly stored there. Also the cell styles are not directly stored there. - ,
/xl/styles.xml which contains the cell styles,
/xl/sharedStrings.xml which contains all string content of cells in all sheets. This is to avoid multiple storing the same string much times if this string is used multiple times in cells.

So if you wants to read the *.xlsx ZIP archive, you needs unpacking the ZIP archive and then parsing at least the four XML files mentioned above to get the data for the XSSFWorkbook. This is what apache poi does while XSSFWorkbook wb = new XSSFWorkbook(fileinputstream);.

So if you really needs a XSSFWorkbook as the result, there is no way around this process. And if you not suspects that apache poi had programmed explicit delay routines, then there will not be a possibility to reduce the amount of time for this process.

Your approach only to read less rows than are stored into the sheet, could possibly be time saving. But then your result would be a XSSFWorkbook containing all the styles and all the string contents but only some sheet data related to those styles and string data. So it will lead to a partially broken XSSFWorkbook. Thats why nobody has really thought about this approach.

Only if the requirement is only to read the plain unformatted data from one of the /xl/worksheets/sheetN.xml without creating a XSSFWorkbook, then you only needs unpacking the ZIP archive and then parsing only the needed /xl/worksheets/sheetN.xml and the /xl/sharedStrings.xml to get the string content of the cells from. This would be possible in less time than the whole process described above.

rjdkolb · Answer 2 · 2017-09-11T12:08:23.870

1

To increase performance slightly on very large files, simply pass the file directly in to the workbook instead of a stream.

From this link:

When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.

XSSFWorkbook wb = new 
XSSFWorkbook(new File (filename));           // (2)
XSSFSheet sheet = wb.getSheetAt(0);

Also, it's cleaner to use the WorkBookFactory as it abstracts the type of excel file you are working with :

Workbook workbook = WorkbookFactory.create(new File(filename));

edited Sep 11 '17 at 12:08

answered Sep 10 '17 at 06:04

rjdkolb

10,377
11
69
89

The problem with using a `File` instead a `FileInputStream` is that you cannot write changings on the `XSSFWorkbook` into the same file `filename`. – Axel Richter Sep 10 '17 at 06:39
I guess that could be a problem. I write reports off templates, so it fits my use case. – rjdkolb Sep 10 '17 at 06:52
Just tested and there is not a significant time saving using `File` versus `FileInputStream`. This is not surprising since the difference is only that using a `File` the `ZipPackage` is opened form the `File` directly. While using `FileInputStream` it is opened by reading the whole `ZIP` content into the memory. So using a `File` there is a lower memory footprint. But the whole parsing process will be the same. So there could be time saving only if there is less memory available too. – Axel Richter Sep 10 '17 at 07:31

converting excel to XSSFWorkbook using Apache poi taking long time

2 Answers2