After reading @Darkonaut 's answer here I am still having trouble getting batches of my iterable sent to the function to be multiprocessed.
My set up:
$ python --version
Python 3.6.7 :: Anaconda, Inc.
OS X 10.15.7
Also tested on RHEL 8, same result.
Here is a minimal example with the output below
import multiprocessing
from itertools import repeat
iterable = list(range(1000))
def dummy_func(arg1, arg2):
if hasattr( arg1, "__len__"):
print(f"Batch size: {len(arg1)}")
else:
print("Single item sent to func.")
print(f"Arg1: {arg1}, Arg2: {arg2}")
with multiprocessing.Pool(processes=8) as pool:
pool.starmap(
dummy_func,
zip(
iterable,
repeat("Static second variable")),
chunksize = 10)
pool.close()
pool.join()
No matter if chunksize is None, or any number above 1, it aways sends a single item. Output:
Single item sent to func.
Arg1: 0, Arg2: Static second variable
Single item sent to func.
Arg1: 1, Arg2: Static second variable
Single item sent to func.
Arg1: 2, Arg2: Static second variable
...
Arg1: 989, Arg2: Static second variable
My expectation was that arg1 would be a list of items of length chunksize, is that not correct?