0

So i'm working on an app that needs to frequently read large XLSX files. I'm using Java, and Apache POI keeps running out of memory on certain XLSX files. I know theres a way to XML parse with POI, but it looks pretty messy.

Resaving these files as another format (XLS, CSV) is not an option because the entire process needs to be automated, and some of these files have multiple sheets or exceed the row count allotted for XLS files.

I've also thought about writing a script to "recreate" the excel files with only the underlying data, but this is not ideal because there are formats that need to be preserved on some files.

Are there any languages that I can call from Java that can read large XLSX files without memory issues?

Gus
  • 241
  • 1
  • 6
  • 17
  • 1
    This could help you: http://stackoverflow.com/questions/3560950/best-language-to-parse-extremely-large-excel-2007-files – Dalton Nov 18 '14 at 23:38
  • @EJP it doesn't seem like theres a viable solution to this in Java, so I'm wondering if theres another language i can call – Gus Nov 19 '14 at 15:22
  • @Dalton I gave that a try with a 40 MB file I have, and it ran out of heap. Unfortunately increasing heap is not an option at this point – Gus Nov 19 '14 at 15:33
  • 1
    The POI XLSX SAX/Event stuff is pretty easy to use for text extraction type use cases, are you sure you can't just use it? – Gagravarr Nov 19 '14 at 16:05

1 Answers1

1

@Gus, I had the same problem. I had to read a 13MB XLSX and ran out of heap with conventional POI. I had to implement XSSF+SAX API to read the file. Although very difficult to understand at first, I'm able to read my XLSX file easily with it (and very quickly, too).

http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

The Apache guys give an example of it's usage in the link. In my case I copied the code and adapted to my needs.

Dalton
  • 420
  • 4
  • 12