I've searched Python 3 Docs, example paralized code, and threads regarding setting the chunksize argument, but I am still at a loss.
I have multiple large arrays that are passed as arguments to a function that I need to parallelize. After I zip the function arguments to make it an iteratable, the starmap function seems to pass the arguments with the chunksize = 1, even when I pass a different chunksize argument to starmap. Ideally, I need to chop up my arrays into X chunks, and send them to X CPUs. What am I doing wrong?
I used the following example code
import multiprocessing
import numpy as np
def square(x, y):
if type(x) == int or type(y) == int:
print(x, y, 'Passed one pair of values')
return np.multiply(x,y)
a = range(100)
b = range(100)
use_num_cpu = multiprocessing.cpu_count()-1
pl = multiprocessing.Pool(processes = use_num_cpu)
args = zip(a, b)
M = pl.starmap(func = square, iterable = args, chunksize = 10)
The output looks like this:
0 0 Passed one pair of values
10 10 Passed one pair of values
1 1 Passed one pair of values
11 11 Passed one pair of values
20 20 Passed one pair of values
2 2 Passed one pair of values
12 12 Passed one pair of values
13 13 Passed one pair of values
3 3 Passed one pair of values
14 14 Passed one pair of values
15 15 Passed one pair of values
4 4 Passed one pair of values
16 16 Passed one pair of values
5 5 Passed one pair of values
...
90 90 Passed one pair of values
77 77 Passed one pair of values
78 78 Passed one pair of values
91 91 Passed one pair of values
79 79 Passed one pair of values
92 92 Passed one pair of values
93 93 Passed one pair of values
94 94 Passed one pair of values
95 95 Passed one pair of values
96 96 Passed one pair of values
97 97 Passed one pair of values
98 98 Passed one pair of values
99 99 Passed one pair of values