I am attempting to speed up some python code can only run single threaded. I an running many of these in a for loop and would like to parallelize it and save the results in a dictionary.
I've searched stack overflow and read the multiprocessing
documentation but can't find a good solution.
Example of not parallelized:
%%time
# This only uses one thread! It's slow
mydict = {}
for i in range(20000000):
mydict[i] = i**2
Returns:
CPU times: user 8.13 s, sys: 1.04 s, total: 9.17 s
Wall time: 9.21 s
and my dictionary is correct
print([mydict[i] for i in range(10)])
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
My attempt at parallelizing:
%%time
import multiprocessing as mp
from multiprocessing import Process, Manager
def square(d, i):
d[i] = i**2
with mp.Manager() as manager:
d = manager.dict()
with manager.Pool(processes=4) as pool:
pool.map(square, (d, range(20000000)))
Returns:
TypeError: square() missing 1 required positional argument: 'i'
Expected results are the correct dictionary but the time being roughly 1/4 of 9.21s.