3

I have a simple test program written and executed in python 3.6.3 below. It is being executed on a machine with 4 cores.

import multiprocessing
import time

def f(num):
  print(multiprocessing.current_process(), num)
  time.sleep(1)
  if (num % 2):
    raise Exception


pool = multiprocessing.Pool(5)

try:
  pool.map(f, range(1,20))
except Exception as e:
  print("EXCEPTION")

pool.close()
pool.join()

Output with pool = multiprocessing.Pool(5):

<ForkProcess(ForkPoolWorker-1, started daemon)> 1
<ForkProcess(ForkPoolWorker-2, started daemon)> 2
<ForkProcess(ForkPoolWorker-3, started daemon)> 3
<ForkProcess(ForkPoolWorker-4, started daemon)> 4
<ForkProcess(ForkPoolWorker-5, started daemon)> 5
<ForkProcess(ForkPoolWorker-2, started daemon)> 6
<ForkProcess(ForkPoolWorker-1, started daemon)> 7
<ForkProcess(ForkPoolWorker-4, started daemon)> 8
<ForkProcess(ForkPoolWorker-3, started daemon)> 9
<ForkProcess(ForkPoolWorker-5, started daemon)> 10
<ForkProcess(ForkPoolWorker-2, started daemon)> 11
<ForkProcess(ForkPoolWorker-1, started daemon)> 12
<ForkProcess(ForkPoolWorker-4, started daemon)> 13
<ForkProcess(ForkPoolWorker-3, started daemon)> 14
<ForkProcess(ForkPoolWorker-5, started daemon)> 15
<ForkProcess(ForkPoolWorker-2, started daemon)> 16
<ForkProcess(ForkPoolWorker-1, started daemon)> 17
<ForkProcess(ForkPoolWorker-3, started daemon)> 18
<ForkProcess(ForkPoolWorker-4, started daemon)> 19
EXCEPTION

But if I change the process count of the pool to be equal to or less than the number of cores on my machine, each call to f() where num is even does not print.

output with pool = multiprocessing.Pool(4):

<ForkProcess(ForkPoolWorker-1, started daemon)> 1
<ForkProcess(ForkPoolWorker-2, started daemon)> 3
<ForkProcess(ForkPoolWorker-3, started daemon)> 5
<ForkProcess(ForkPoolWorker-2, started daemon)> 7
<ForkProcess(ForkPoolWorker-1, started daemon)> 9
<ForkProcess(ForkPoolWorker-3, started daemon)> 11
<ForkProcess(ForkPoolWorker-3, started daemon)> 13
<ForkProcess(ForkPoolWorker-1, started daemon)> 15
<ForkProcess(ForkPoolWorker-2, started daemon)> 17
<ForkProcess(ForkPoolWorker-1, started daemon)> 19
EXCEPTION

I don't understand why these processes are being killed, especially when the exception isn't even thrown until after the print statement in the function. I really don't understand why it only happens when the process count in the pool is equal to or less than the number of cores on the machine.

martineau
  • 119,623
  • 25
  • 170
  • 301

1 Answers1

3

referring to the specification of multiprocessing.Pool.map you can see one optional argument chunksize, if you specify it to 1, i.e. pool.map(f, range(1,20), 1), then you would yield the expected result.

if you increase the chunk size (= 6 for example), you might see:

<SpawnProcess(SpawnPoolWorker-1, started daemon)> 1
<SpawnProcess(SpawnPoolWorker-4, started daemon)> 7
<SpawnProcess(SpawnPoolWorker-3, started daemon)> 13
<SpawnProcess(SpawnPoolWorker-2, started daemon)> 19

this suggests that number of chunksize of tasks are assigned to a single thread in the Pool, when you raise exception during each thread, of course the tasks in the remaining chuck would not be executed.

From here you can know that the default value for chunksize is 2 in your case, the reason of existence of such variable, to be seen fairly easily, is to reduce the number of new threads which need to be initialized (which might save both resources and processing time, when you have appropriate chunksize).

Valen
  • 1,693
  • 1
  • 20
  • 17
  • Nice answer. To elaborate a bit - calling `.map()` spawns the tasks with `.submit()`, but it doesn't join them (unless you use `pool` as a context manager - then the joining will happen on `__exit__`). Hence the exceptions do happen but don't "show up" because `.map()` returns an iterator over Future objects. – Brad Solomon Jan 10 '19 at 21:39
  • 2
    Also, as you point out, the chunksize is 2 in this specific case and the calculation for that is found [here](https://github.com/python/cpython/blob/master/Lib/multiprocessing/pool.py#L413), with some explanation [here](https://stackoverflow.com/q/53751050/7954504) – Brad Solomon Jan 10 '19 at 21:40