I have a pool of workers which perform the same identical task, and I send each a distinct clone of the same data object. Then, I measure the run time separately for each process inside the worker function.
With one process, run time is 4 seconds. With 3 processes, the run time for each process goes up to 6 seconds.
With more complex tasks, this increase is even more nuanced.
There are no other cpu-hogging processes running on my system, and the workers don't use shared memory (as far as I can tell). The run times are measured inside the worker function, so I assume the forking overhead shouldn't matter.
Why does this happen?
def worker_fn(data):
t1 = time()
data.process()
print time() - t1
return data.results
def main( n, num_procs = 3):
from multiprocessing import Pool
from cPickle import dumps, loads
pool = Pool(processes = num_procs)
data = MyClass()
data_pickle = dumps(data)
list_data = [loads(data_pickle) for i in range(n)]
results = pool.map(worker_fn,list_data)
Edit: Although I can't post the entire code for MyClass(), I can tell you that it involves a lot of numpy matrix operations. It seems that numpy's use of OpenBlass may somehow be to blame.