I have a function f(x)
I want to evaluate over list of values xrange
in parallel. The function does something like this:
def f(x, wrange, dict1, dict2):
out_list = []
v1 = dict1[x]
for w in wrange:
v2 = dict2[x-w]
out_list += [np.dot(v1, v2)]
return out_list
it takes values a matrix from a dictionary dict1
, a vector from dictionary dict2
, then multiplies them together. Now my normal approach for doing this in parallel would be something like this:
import functools
import multiprocessing
par_func = functools.partial(f, wrange=wrange, dict1=dict1, dict2=dict2)
p = multiprocessing.Pool(4)
ssdat = p.map(par_func, wrange)
p.close()
p.join()
Now when dict1
and dict2
are big dictionaries, this causes the code to fail with the error
File "/anaconda3/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
and I think this is because pool
is making copies of the dict1
and dict2
for every evaluation of my function. Is there an efficient way, instead, to set these dictionaries as shared memory objects? Is map
the best function to do this?