I have a python script that concurrently processes numpy arrays and images in a random way. To have proper randomness inside the spawned processes I pass a random seed from the main process to the workers for them to be seeded.
When I use maxtasksperchild
for the Pool
, my script hangs after running Pool.map
a number of times.
The following is a minimal snippet that reproduces the problem :
# This code stops after multiprocessing.Pool workers are replaced one single time.
# They are replaced due to maxtasksperchild parameter to Pool
from multiprocessing import Pool
import numpy as np
def worker(n):
# Removing np.random.seed solves the issue
np.random.seed(1) #any seed value
return 1234 # trivial return value
# Removing maxtasksperchild solves the issue
ppool = Pool(20 , maxtasksperchild=5)
i=0
while True:
i += 1
# Removing np.random.randint(10) or taking it out of the loop solves the issue
rand = np.random.randint(10)
l = [3] # trivial input to ppool.map
result = ppool.map(worker, l)
print i,result[0]
This is the output
1 1234 2 1234 3 1234 . . . 99 1234 100 1234 # at this point workers should've reached maxtasksperchild tasks 101 1234 102 1234 103 1234 104 1234 105 1234 106 1234 107 1234 108 1234 109 1234 110 1234
then hangs indefinitely.
I could potentially replace numpy.random
with python's random
and get away with the problem. However in my actual application, the worker will execute user code (given as argument to the worker) which i have no control over, and would like to allow using numpy.random
functions in that user code. So I intentionally want to seed the global random generator (for each process independently).
This was tested with Python 2.7.10, numpy 1.11.0, 1.12.0 & 1.13.0, Ubuntu and OSX