0

I am trying to open an excel file which is 22MB in size and contains 9905 rows (not that large in my opinion).

I try to open it like this with java apache poi:

final File f = Utils.getFileWithRespectToJar(this, filename);
final Workbook workbook = WorkbookFactory.create(f);

final Sheet sheet = workbook.getSheetAt(0);
final int maxRowLength = sheet.getLastRowNum();

After a few seconds, java takes about 3.3GB of RAM and 90% of CPU and after a minute or so my eclipse project crashes with the exception

java.lang.OutOfMemoryError: GC overhead limit exceeded

Is apache poi just very unperformant or am I doing something wrong?

Edit: I was previously using a .xlsm file (an excel file with macros enabled). I saved it again as a .xlsx file (macros disabled) and increased the heap size to 4GB. Now java takes 4GB of RAM (=90%) and 90% of CPU for around ~15 minutes before my eclipse application crashes. I was using apache poi 3.16 and updated my maven pom file to use 4.0.0 (latest version) but this new version works differently so I get a lot of exceptions, e.g:

org.apache.poi.ooxml.POIXMLException: org.apache.xmlbeans.XmlOptions.setEntityExpansionLimit(I)Lorg/apache/xmlbeans/XmlOptions; at org.apache.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:66) at org.apache.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:648) at org.apache.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:180) at org.apache.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:286) at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:83) at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:111)

Caused by: java.lang.NoSuchMethodError: org.apache.xmlbeans.XmlOptions.setEntityExpansionLimit(I)Lorg/apache/xmlbeans/XmlOptions; at org.apache.poi.ooxml.POIXMLTypeLoader.(POIXMLTypeLoader.java:43) at org.apache.poi.xssf.model.ThemesTable.(ThemesTable.java:86)

Project_Prkt
  • 91
  • 1
  • 12
  • Try using the Streaming API. See: https://stackoverflow.com/questions/33786219/apache-poi-streaming-sxssf-for-reading – Alan Hay Oct 31 '18 at 08:31
  • 2
    Opening a Workbook from an Excel file is not only reading in the bytes of the file but also creating all relations between the workbook, it's sheets, the shared strings of the sheets, the formulas in the cells, the cell styles, ... So how much memory this takes highly depends on the internally structure of the workbook and how much of such relations there are. So answering such questions are mostly not possible without having the Excel file. – Axel Richter Oct 31 '18 at 08:31
  • 2
    Messing around with different versions of apache poi will not lead to success. If using apache poi 3.16 leads to `GC overhead limit exceeded` then using apache poi 4.0.0 will also lead to that. Updating to the newest version will be good. But then all dependencies must be updated. Mixing of differnet versions is not supported. – Axel Richter Oct 31 '18 at 08:36
  • Okay, I now updated all dependencies I used, trying again :) – Project_Prkt Oct 31 '18 at 08:53

1 Answers1

0

I removed all cell formatting and macros, so basically the excel file only is black text now and apache poi doesn't crash anymore.

Project_Prkt
  • 91
  • 1
  • 12
  • 1
    So the needed memory for storing the relations between cells and their cell styles and probably rich text in the shared strings table was what had been the last straw. Most programmers do not understand the differnce between running `Excel` as GUI, where it has fully access to all memory of the client system where the GUI is running in, and running `apache poi` in a server system, where it has only partially access to the server's memory because other server processes needs memory too. So the claim "`apache poi` is just very unperformant" is not true. – Axel Richter Oct 31 '18 at 11:53