5

Is there a memory-efficient Java library to read large Microsoft Excel files (both .xls and .xlsx)? I have very limited experience with Apache POI, and it seemed to be a huge memory hog from what I recall (though perhaps this was just for writing and not for reading). Is there something better? Or am I misremembering and/or misusing POI?

It would be important for it to have a "friendly" open-source license as well.

Michael McGowan
  • 6,528
  • 8
  • 42
  • 70
  • only other one I know of is http://jexcelapi.sourceforge.net/ . I have never used it myself so can't really comment on the memory usage. – CoolBeans Jan 20 '11 at 20:57
  • How much memory is too much for you? – Amir Afghani Jan 20 '11 at 21:12
  • How much is too much depends. Ideally though if the file is such that it could be processed if first saved as a .csv, I would like it if it could be processed as an Excel file. The ideal might not be possible, but I would like to be closer. – Michael McGowan Jan 20 '11 at 23:16

4 Answers4

5

Apache's POI library has an event-based API that has a smaller memory-footprint. Unfortunately, it only works with HSSF (Horrible Spreadsheet Format) and not XSSF (XML Spreadsheet Format - for OOXML files).

Vivin Paliath
  • 94,126
  • 40
  • 223
  • 295
  • Thanks, but that's a bummer that it doesn't work for XSSF, since that's what would be used for files with lots (>65536) of rows. – Michael McGowan Jan 20 '11 at 23:13
  • 3
    Actually it looks like there might be a work-around for XSSF. Can anyone comment on this: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api – Michael McGowan Jan 20 '11 at 23:28
  • @Michael seems like that is a decent workaround, albeit slightly more involved. Since XSSF is ultimately XML, you're using a SAX parser to parse the excel file. – Vivin Paliath Jan 25 '11 at 01:18
2

The Excel file formats are (both) huge and extremely complicated, and anything that reads all of their possible contents is going to be equally huge and complicated. Remember they can contain ranges, macros, links, embedded stuff etc.

However if you are reading something simple like a grid of numbers, I recommend first converting the spreadsheet to something simpler like CSV and then reading that format.

DJClayworth
  • 26,349
  • 9
  • 53
  • 79
  • CSV is the preferred format, but sometimes a user might have what amounts to CSV data stored in a .xls file. I don't want to tell them to open Excel, save as CSV, and then come back to my application. Obviously that is a work-around that will work, but it's far from ideal. – Michael McGowan Jan 20 '11 at 23:09
0

Take a look at JExcel:

http://jexcelapi.sourceforge.net/

I can't account for the memory footprint, but obviously with large spreadsheets your going to consume lots of memory for processing.

You should be able to use it for xls and xlsx:

Read XLSX file in Java

Community
  • 1
  • 1
Jonathan Holloway
  • 62,090
  • 32
  • 125
  • 150
0

I cannot answer your question directly, as I'm not using Java; however I can share a similar experience in Perl that may be partially relevant.

The OOXML format is indeed very large and complex, so any software that aims at covering the full specification is likely to be quite costly in terms of resources. In Perl, the most well-known module for reading .xlsx files is https://metacpan.org/pod/Spreadsheet::ParseXLSX, which does the job well for small and medium files; however it is far too slow on large amounts of data. So I ended up writing another module https://metacpan.org/pod/Excel::ValueReader::XLSX, with far less features, but optimized for fast parsing of large files.

The moral is : there is no one-size-fits-all solution. If you are willing to sacrifice some features for better speed or less memory consumption, you might find other libraries. In Java, https://github.com/dhatim/fastexcel could perhaps be a good candidate (just from reading the documentation).

dami
  • 146
  • 8