I have a task which needs to run a number of subtasks, each on their own vm, and then when all subtasks are complete, merge the results and present them back to the caller.
I have implemented this using a multiprocessing.Pool
, and it's working great.
I now want to scale up, running multiple of these tasks in parallel.
My initial design was to wrap my task running in another multiprocessing.Pool
, where each task runs in its process, effectively fanning out as follows:
job
+----- task_a
| +------ subtask_a1
| +------ subtask_a2
| +------ subtask_a3
+----- task_b
+------ subtask_b1
+------ subtask_b2
+------ subtask_b3
job
starts amultiprocessing.Pool
with 2 processes, one fortask_a
and one fortask_b
.- in turn,
task_a
andtask_b
each start amultiprocessing.Pool
with 3 processes, one for each of their subtasks.
When I tried to run my code, I hit an assertion error:
AssertionError: daemonic processes are not allowed to have children
Searching online for details, I found the following thread, an excerpt of which reads:
As for allowing children threads to spawn off children of its own using subprocess runs the risk of creating a little army of zombie 'grandchildren' if either the parent or child threads terminate before the subprocess completes and returns
I have also found workarounds which allow this kind of "pool within a pool" use:
class NoDaemonProcess(multiprocessing.Process):
@property
def daemon(self):
return False
@daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
class MyPool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(MyPool, self).__init__(*args, **kwargs)
However, given the above quote about "zombie grandchildren", it seems perhaps this is not a good design.
So I guess my question is:
- What is the pythonic way to "fan out" multiple processes within multiple processes"?