I have a python code that uses a java library by means of jpype. Currently, each run of my function checks if JVM exists, and creates it if it is not the case
import jpype as jp
def myfunc(i):
if not jp.isJVMStarted():
jp.startJVM(jp.getDefaultJVMPath(), '-ea', ('-Djava.class.path=' + jar_location))
do_something_hard(i)
Further, I want to parallelize my code using python multiprocessing library. Each thread (supposedly) works independently, calculating value of my function with different parameters. For example
import pathos
pool = pathos.multiprocessing.ProcessingPool(8)
params = np.arange(100)
result = pool.map(myfunc, params)
This construction works fine, except it has dramatic memory leaks when using more than 1 core in the pool. I notice that all memory is free up when python is closed, but memory still accumulates over time while pool.map
is running, which is undesirable. The jpype documentation is incredibly brief, suggesting to synchronize threads by wrapping python threads with jp.attachThreadToJVM
and jp.detachThreadToJVM
. However, I cannot find a single example online on how to actually do it. I have tried wrapping the function do_something_hard
inside myfunc
with these statements, but it had no effect on the leak. I had also attempted to explicitly close JVM at the end of myfunc
using jp.shutdownJVM
. However, in this case JVM seems to crash as soon as I have more than 1 core, leading me to believe that there is a race condition.
Please help:
- What is going on? Why would there be a race condition? Is it not the case, that each thread makes its own JVM?
- What is the correct way to free up memory in my scenario?