Having worked with a non ML program that is Java-heavy lately, I feel your pain.
I cannot tell you whether or not to reset the dynamically allocated memory based on a single undeniable technical fact, but my personal experience tells me that if you are going to continue processing in the native R environment after your Java work, you probably should. It is best to control what you can.
Here is why:
The only times I have ever run out of memory (even working with MASSIVE flat files) is when I have been using JVM in some way. It is not a one time thing, it has happened often.
It even happens just reading and writing large excel files through XLConnect which is Java driven; the memory gets jammed up super quickly. It seems to be a failure in the way R and Java play with each other.
And, r does not automatically garbage collect the way you would hope. It collects when the OS asks for more memory, but things can get slow long before that happens.
Also R only sees objects in memory which it creates, not those it interprets, thus your Java kulch will linger around unbeknownst to R. So if the JVM created it, R will not clean it up if Java does not do so before going dormant. And if memory is selectively recycled you can have fragmented memory gaps which affect performance a lot.
My personal approach has been to create sets, variables, frames...subset to only what I need, then rm()
and gc()
...remove and force garbage collection.
Go on the the next step and do heavy lifting. If I run a Java-based package, I will do this purging more frequently to keep the memory clean.
Once the Java process is done, I use detach(yourlibraryname)
and gc()
to clear everything out.
If you have adjusted 'heaps', I would write the re-adjust here lowering the allocation you give to Javas dynamic memory, because R has no way of taking it back if the Java Virtual Machine is still engaged but not operating as far as I have been able to ascertain. So you should reset it and to give back to R what is R's to use. I think in the long run it will benefit you with faster processing and less lock-ups.
The best way to know how it affects your system as you are using it is to use a sys.time
or proc.time
function to see how long your script takes both with and without forced garbage collections, removals, detachments and heap reallocation.
You can get a solid grasp on how to do this here:
IDRE -UCLE proc.time functions
Hope this helps some!