0

I've been working on a project which uses apache-poi to read .PPT files and change some attributes of SlideShowDocInfoAtom record in ppt file.

I can read the file using HSLFSlideShow, however, when it comes to a large ppt file (e.g. over 1GB), and my application jvm max heap size is restricted to 2GB, poi throws an OutOfMemorry Error.

After reading the source code, I know it will create a byte array when reading one of the streams of the file. In the 1GB file, the PowerPoint Document stream in the file will be up to 1GB, which consumes 1GB memorry space to create byte array, and somehow causes the jvm to crash.

So, is there any way that I can read large ppt file without enlarging jvm heap size, as I only want to read some doc info of this file, don't really want to read large blocks of the file such as audios or videos into memorry.

skaleto
  • 31
  • 1
  • 4
  • 1
    Possible duplicate of [java.lang.OutOfMemoryError: Java heap space while reading excel with Apache POI](https://stackoverflow.com/questions/6069847/java-lang-outofmemoryerror-java-heap-space-while-reading-excel-with-apache-poi) – Dang Nguyen Jan 24 '19 at 04:15
  • The `SlideShowDocInfoAtom` is a `Record`. So if only that record is needed then it would be sufficient only having [HSLFSlideShowImpl](https://poi.apache.org/apidocs/dev/org/apache/poi/hslf/usermodel/HSLFSlideShowImpl.html) since this has [HSLFSlideShowImpl.getRecords](https://poi.apache.org/apidocs/dev/org/apache/poi/hslf/usermodel/HSLFSlideShowImpl.html#getRecords--) to get all records. So first try whether creating `HSLFSlideShowImpl` also leads to out of memory. – Axel Richter Jan 24 '19 at 05:03
  • I'm using poi to open a PPT file, not XLS file. It provides a different way to load XLS file by using **event mode api**, which can save memorry I know, but when loading PPT files, it seems there is no such api. – skaleto Jan 24 '19 at 09:01
  • And I have read the source code and found that the memorry leak happens when it is going to create HSLFSlideShowImpl by loading all the data into memorry – skaleto Jan 24 '19 at 09:03
  • "memory leak happens when it is going to create HSLFSlideShowImpl": Then what you wants is not possible. Because of the internal binary file structure of PPT there is not any possibility for an event user model. At least not for reading **and** writing. – Axel Richter Jan 24 '19 at 11:57

0 Answers0