tl;dr: I have tasks with huge return values that consume a lot of memory. I'm submitting them to a concurrent.futures.ProcessPoolExecutor
. The subprocesses hold onto the memory until they receive a new task. How do I force subprocesses to effectively garbage collect themselves?
Example
import concurrent.futures
import time
executor = concurrent.futures.ProcessPoolExecutor(max_workers=1)
def big_val():
return [{1:1} for i in range(1, 1000000)]
future = executor.submit(big_val)
# do something with future result
In the above example I'm creating a large object in a subprocess then working with the result. From this point onward, I can deal with the memory in the parent process, but the subprocess created by my ProcessPoolExecutor will hold onto the memory allocated for my task indefinitely.
What I've tried
Honestly, the only thing I can think of is submitting a dummy task:
def donothing():
pass
executor.submit(donothing)
This works, but is a) pretty clunky and more importantly b) untrustworthy, because I don't have guarantees about which subprocess I'm sending tasks to, so the only foolproof way is to send a flood to ensure the subprocesses I care about get a copy.
As far as I can tell, as soon as a worker process has finished running my task, it has no reason to hold onto the result. If my parent process assigned the returned a Future
to a local variable, then the moment the task was completed the return value would be copied to the Future
in the parent, meaning the worker has no further need for it. If my parent process didn't do this, then the return value is effectively discarded anyway.
Am I misunderstanding something here, or is this just an unfortunate quirk of how subprocesses reference memory? If so, is there a better workaround?