The problem is that t_dict
is passed as part of the partial function f_args
. Partial functions are instances of <class 'functools.partial'>
. When you create the partial, it gets a reference to test
and the empty dictionary in mod
. Every time you call f_args
, that one dictionary on the partial object is modified. This is easier to spot with a list in a single process.
>>> def foo(name, t_list):
... t_list.append(name)
... return t_list
...
>>> mod = []
>>> f = functools.partial(foo, t_list=mod)
>>> f(0)
[0]
>>> f(1)
[0, 1]
>>> f(2)
[0, 1, 2]
>>> mod
[0, 1, 2]
When you pool.map(f_args, iterator)
, f_args
is pickled and sent to each subprocess to be the worker. So, each subprocess has a unique copy of the dictionary that will be updated for every iterated value the subprocess happens to get.
For efficiency, multiprocessing will chunk data. That is, each subprocess is handed a list of iterated values that it will process into a list of responses to return as a group. But since each response is referencing the same single dict, when the chunk is returned to the parent all of the responses only hold the final value set. If 0, 1, 2
were processed, the return is 2, 2, 2
.
The solution will depend on your data. Its expensive to pass data back and forth between pool process and the parent, so ideally the data is generated completely in the worker. In this case, ditch the partial
and have the worker create the dict.
Its likely your situation is more complicated than this.
import multiprocessing as mp
import numpy as np
def test(name):
retrurn ('a':name}
def mp_func(func, iterator ,**kwargs):
pool = mp.Pool(mp.cpu_count())
res = pool.map(test, iterator)
pool.close()
return res
m =33
res = mp_func(func=test, iterator=np.arange(m))
for di in res:
print(di['a'])