I developed a Java class to use the SXSSF workbook to overcome the Java Heap error which is always caused by the XSSF workbook when writing very large spreadsheets. See http://poi.apache.org/spreadsheet/how-to.html#sxssf for the solution.
In short, I use workbook = new SXSSFWorkbook(SXSSFWorkbook.DEFAULT_WINDOW_SIZE);
instead of workbook = new XSSFWorkbook();
This solution works like magic and I am now able to write very large xlsx spreadsheets with a much lower memory footprint.
However, there is one problem which I am stuck. When using the SXSSF workbook, it actually generates a larger xlsx file size compared with Excel itself. For example, 33MB is created by SXSSF while Excel creates 25MB with the same dataset. Why?
I investigated and found the reason:
I changed the .xlsx extension name to .zip so I could open it and view the \xl\worksheets\sheet1.xml worksheet file. I discovered inline strings using t="inlineStr"
are produced by SXSSF while the shared strings table using t="s" (v tag) is produced by Excel. I checked XSSF also creates the shared strings table. Having t="inlineStr" in every cell in an enormous spreadsheet blows up the file size faster.
My question, is it possible to force the SXSSF workbook to generate the sheet1.xml
file based on the shared strings table (using v tags - t="s") rather than the inline strings (t="inlineStr")? I am hoping this solution would reduce the final file size considerably. Anyone knows?