From experience, SAX really helps a lot with memory performance. Went from 4GB+ to around 300MB.
Some useful links and other tips:
From https://poi.apache.org/spreadsheet/limitations.html
File sizes/Memory usage
There are some inherent limits in the Excel file formats. These are
defined in class SpreadsheetVersion. As long as you have enough
main-memory, you should be able to handle files up to these limits.
For huge files using the default POI classes you will likely need a
very large amount of memory.
There are ways to overcome the main-memory limitations if needed: For
writing very huge files, there is SXSSFWorkbook which allows to do a
streaming write of data out to files (with certain limitations on what
you can do as only parts of the file are held in memory). For reading
very huge files, take a look at the sample XLSX2CSV which shows how
you can read a file in streaming fashion (again with some limitations
on what information you can read out of the file, but there are ways
to get at most of it if necessary).
Also
https://poi.apache.org/faq.html#faq-N10165
- I think POI is using too much memory! What can I do? This one comes up quite a lot, but often the reason isn't what you might
initially think. So, the first thing to check is - what's the source
of the problem? Your file? Your code? Your environment? Or Apache POI?
(If you're here, you probably think it's Apache POI. However, it often
isn't! A moderate laptop, with a decent but not excessive heap size,
from a standing start, can normally read or write a file with 100
columns and 100,000 rows in under a couple of seconds, including the
time to start the JVM).
Apache POI ships with a few programs and a few example programs, which
can be used to do some basic performance checks. For testing file
generation, the class to use is in the examples package,
SSPerformanceTest (viewvc). Run SSPerformanceTest with arguments of
the writing type (HSSF, XSSF or SXSSF), the number rows, the number of
columns, and if the file should be saved. If you can't run that with
50,000 rows and 50 columns in HSSF and SXSSF in under 3 seconds, and
XSSF in under 10 seconds (and ideally all 3 in less than that!), then
the problem is with your environment.
Next, use the example program ToCSV (viewvc) to try reading the a file
in with HSSF or XSSF. Related is XLSX2CSV (viewvc), which uses SAX
parsing for .xlsx. Run this against both your problem file, and a
simple one generated by SSPerformanceTest of the same size. If this is
slow, then there could be an Apache POI problem with how the file is
being processed (POI makes some assumptions that might not always be
right on all files). If these tests are fast, then any performance
problems are in your code!
And
Files vs InputStreams http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream
When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.
If using WorkbookFactory, it's very easy to use one or the other:
// Use a file
Workbook wb = WorkbookFactory.create(new File("MyExcel.xls"));
// Use an InputStream, needs more memory
Workbook wb = WorkbookFactory.create(new FileInputStream("MyExcel.xlsx"));
If using HSSFWorkbook or XSSFWorkbook directly, you should generally
go through NPOIFSFileSystem or OPCPackage, to have full control of the
lifecycle (including closing the file when done):
// HSSFWorkbook, File
NPOIFSFileSystem fs = new NPOIFSFileSystem(new File("file.xls"));
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
....
fs.close();
// HSSFWorkbook, InputStream, needs more memory
NPOIFSFileSystem fs = new NPOIFSFileSystem(myInputStream);
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
// XSSFWorkbook, File
OPCPackage pkg = OPCPackage.open(new File("file.xlsx"));
XSSFWorkbook wb = new XSSFWorkbook(pkg);
....
pkg.close();
// XSSFWorkbook, InputStream, needs more memory
OPCPackage pkg = OPCPackage.open(myInputStream);
XSSFWorkbook wb = new XSSFWorkbook(pkg);
....
pkg.close();