What is the true nature of "blocking" in the context of pool.map() and pool.join()?

Question

I am trying to sketch a picture for myself of how to appropriately use Pool object.

I have a slightly more complex task, but here's the gist:

def func1(x):
    return x*2

def func2(x):
    return np.sqrt(x)

with Pool(os.cpu_count()) as p:
    x = p.map(func1, range(1000))
    x = p.map(func2, x)

Then comes some documentation of pool.map and pool.join:

map(func, iterable[, chunksize]):

A parallel equivalent of the map() built-in function (it supports only one iterable argument though, for multiple iterables see starmap()). It blocks until the result is ready.

And

join()

Wait for the worker processes to exit. One must call close() or terminate() before using join().

I don't have strong understanding of what "block" means, but it seems like if I call x = p.map(func1, arg) followed by y = p.map(func2, x) that the pool will be strictly assigned to the first task until it is complete, and then it will be allowed to work on the next task.

Question 1: Is that understanding correct?

If my understanding is correct, it seems like I don't need to use p.join() as it seems to do the same thing (blocks the pool from being used until it's finished with its current job).

Question 2: Do I need to use p.join() for a task like this one?

Finally, I see pool.close(), which "Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit". How can more tasks be submitted without me telling it?

Question 3: Do I need to do anything after all the work is done, like call p.close()?

Have a read on this related question https://stackoverflow.com/questions/38271547/when-should-we-call-multiprocessing-pool-join — James Lin, Aug 13 '20 at 21:48

score 0 · Answer 1 · answered Aug 13 '20 at 21:33

0

You can create Processes and Pools directly (and start and stop them manually) or you use the with construct (as you did) so that it is automatically handled for you.

This should give you the same result as your code:

p = Pool(os.cpu_count())
x = p.map(func1, range(1000))
x = p.map(func2, x)
p.close()

answered Aug 13 '20 at 21:33

Eric Dörheit

173
6

Does someone care to explain why his answer was down-voted? I know it doesn't answer all of the question, but it does at least partly. Unless the information is incorrect, in which case you should leave a comment with your downvote. – rocksNwaves Aug 13 '20 at 21:41

alex_noname · Answer 2 · 2020-08-14T15:59:48.530

About Pool.join()

Pool.join() waits for the worker processes to exit (terminated).
Pool.map() blocks until the result is ready. At the same time, the processes in the pool do not terminate but are ready to accept new tasks.

Processes in the pool do not terminate after completing task, so one must call close() or terminate() before using join():

pool = Pool(processes=5)
results = pool.map(func, iterable)
pool.close()
pool.join()

When you use with context manager, you need not to call close or join because exiting the with-block has stopped the pool:

with Pool(processes=5) as pool:
    results = pool.map(func, iterable)

In contrast, the Process class terminates after the function is completed:

p = Process(target=f, args=('arg_1',))
p.start()
p.join()

All computer programs are written by people, but the interaction between parts of the system can be quite complex, for example, tasks can be submitted when some events are triggered, requests are received, etc.
See point 1

What is the true nature of "blocking" in the context of pool.map() and pool.join()?

2 Answers2