4

Hi everyone!

I am currently working on a tool that automatically reads excel files and restructures information according to a given set of rules using Apache POI. The project is going great so far, but I have one problem that I am not able to resolve:

- After closing a workbook, the memory allocated for it is not garbage collected.

I have broken down the problem into a tiny piece of code that replicates the issue. I am going to omit try/catch blocks for the sake of readability:

//the declaration and creation of the objects is seperated due to ommitted try/catch blocks
Workbook wb = null;
FileInputStream fs = null;

//opening filestream and workbook
fs = new FileInputStream("C:/Users/XXX/somefile.xlsm");
wb = WorkbookFactory.create(fs);

//closing them again, making them available for garbage collection
fs.close();
wb.close();

//added to make sure that the reference to the workbook/filestream is null
fs = null;
wb = null;

//added to manually trigger gc in hope that this will fix it
Runtime.getRuntime().gc();

//wait forever for me to check the RAM usage
while(true){Thread.sleep(1000)};

As soon as POI is used to open a workbook, it seems to create some kind of buffer that fills the maximum amount of memory specified by the Xmx argument. The memory is not freed when I close the workbook. I also tried a version without the use of the factory to check if there might be lost references through that module, but no luck...

Can someone give me a hint on why the memory is not deallocated/garbage collected?

By the way, I am using Apache POI 3.17, but I also tested 4.0 (but not the recently released 4.0.1, tbh... yes, I am a hack and a fraud ^^)

Thank you very much in advance!

sapiensl
  • 81
  • 1
  • 7
  • Interesting, how much memory does it use? How much does it increase after loading your workbooks and closing them? Does it just keep increasing when you open new workbooks and close them? – Mark Dec 14 '18 at 13:23
  • The simplest approach would be to create a heap dump and analyze it. – Holger Dec 14 '18 at 15:23
  • @Mark It uses 256mb with -Xmx256, 1gb with -Xmx1g etc. The usage jumps to this point whenever I first use POI and stays there until the program exits. This made me think that what I am looking at is an out-of-control heap buffer allocation. – sapiensl Dec 14 '18 at 15:38
  • @Holger Thank you for your comment! I will look into heap analysis and check if I can find something out. – sapiensl Dec 14 '18 at 15:41
  • 1
    The question is if the memory is actually still used. How do you check memory usage? Check out this answer from another thread: https://stackoverflow.com/a/34851894/4949750 – Amongalen Dec 14 '18 at 15:56
  • @sapiensl You can try to use try-with-resources which automatically closes resources. `try (FileInputStream fs = new FileInputStream("C:/Users/XXX/somefile.xlsm");Workbook wb = WorkbookFactory.create(fs);) { }` if the problem is in garbage collection. – Margulan Zharkenov Dec 15 '18 at 06:30
  • @Amongalen Ah, that might be it. The behaviour is exactly the same. I will experiment with this if I can find an hour off our more pressing projects today and will report back when I get to it. Thank you for the response, very much appreciated! – sapiensl Dec 17 '18 at 09:05

2 Answers2

4

Hi again,

I figured it out: Java was just lazy in reducing the heap allocation. Since the software only requires a lot of memory for a very short amount of time, I managed to tame the behaviour using the following JVM arguments:

-Xms32m
-Xmx1g
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC 
-XX:MaxHeapFreeRatio=15 
-XX:MinHeapFreeRatio=5

Now, the memory is returned after the operation is finished.

Here are the resources I used to figure it out:

Thank you to all who contributed. Cheers!

sapiensl
  • 81
  • 1
  • 7
1

From documentation forn Runtime.gc():

Runs the garbage collector. Calling this method suggests that the Java virtual machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the virtual machine has made its best effort to recycle all discarded objects.

From my understanding JVM doesn't have to recycle anything if it doesn't want to, even if you call Runtime.gc().

Amongalen
  • 3,101
  • 14
  • 20
  • Thank you for the fast reply, much appreciated! Yes, I saw that, but I did not want to miss the chance of it working. Speaking of the problem itself, I found other posts that just told the OP to simply let the variables go out of scope or nullptr them, but obviously that did not work either. – sapiensl Dec 14 '18 at 15:44