I currently have a piece of code which spawns multiple processes as follows:
pool = Pool(processes=None)
results = [pool.apply(f, args=(arg1, arg2, arg3)) for arg3 in arg_list]
My idea was that this would divide the work across cores, using all cores available since processes=None
. However, the documentation for the Pool.apply()
method in the multiprocessing module docs reads:
Equivalent of the apply() built-in function. It blocks until the result is ready, so apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.
First question:
I don't clearly understand this. How does apply
distribute the work across workers, and in what way is it different from what apply_async
does? If the tasks get distributed across workers, how is it possible that func
is only executed in one of the workers?
My guess: my guess would be that the apply
, in my current implementation, is giving a task to a worker with a certain set of arguments, then waiting for that worker to be done, and then giving the next set of arguments to another worker. In this way I am sending work to different processes, yet no parallelism is taking place. This seems to be the case since apply
is in fact just:
def apply(self, func, args=(), kwds={}):
'''
Equivalent of `func(*args, **kwds)`.
Pool must be running.
'''
return self.apply_async(func, args, kwds).get()
Second question: I would also like to understand better why, in the introduction of the docs, section 16.6.1.5. ('Using a pool of workers'), they say that even a construction with apply_async
such as
[pool.apply_async(os.getpid, ()) for i in range(4)]
may use more processes, but it isn't sure that it will. What decides whether multiple processes will be used?