I am trying to sketch a picture for myself of how to appropriately use Pool
object.
I have a slightly more complex task, but here's the gist:
def func1(x):
return x*2
def func2(x):
return np.sqrt(x)
with Pool(os.cpu_count()) as p:
x = p.map(func1, range(1000))
x = p.map(func2, x)
Then comes some documentation of pool.map
and pool.join
:
map(func, iterable[, chunksize]):
A parallel equivalent of the map() built-in function (it supports only one iterable argument though, for multiple iterables see starmap()). It blocks until the result is ready.
And
join()
Wait for the worker processes to exit. One must call close() or terminate() before using join().
I don't have strong understanding of what "block" means, but it seems like if I call x = p.map(func1, arg)
followed by y = p.map(func2, x)
that the pool
will be strictly assigned to the first task until it is complete, and then it will be allowed to work on the next task.
Question 1: Is that understanding correct?
If my understanding is correct, it seems like I don't need to use p.join()
as it seems to do the same thing (blocks the pool from being used until it's finished with its current job).
Question 2: Do I need to use p.join() for a task like this one?
Finally, I see pool.close(), which "Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit". How can more tasks be submitted without me telling it?
Question 3: Do I need to do anything after all the work is done, like call p.close()
?