I am doing some calculations on large collections of bytes. The process runs on chunks of bytes. I am trying to use parallel processing using multiprocessing for performance enhancement. Initially I tried to use pool.map but that only allows single argument, then I found about pool.starmap. But pool.starmap gives results only when all the processes have finished. I want results as they come (sort of). I am trying to use pool.imap which does provide results as processes finish but does not allow multiple arguments (my function requires 2 arguments). Also, the sequence of result is important.
Some sample code below:
pool = mp.Pool(processes=4)
y = []
for x in pool.starmap(f, zip(da, repeat(db))):
y.append(x)
The above code works, but only gives the results once all the processes have completed. I cannot see any progress. This is why I tried to use pool.imap, works well but with only single argument:
pool = mp.Pool(processes=4)
y = []
for x in pool.imap(f, da)):
y.append(x)
On multiple arguments raises the following exception:
TypeError: f() missing 1 required positional argument: 'd'
Looking for simple way to achieve all 3 requirements:
- parallel processing using multiple parameters/arguments
- manage to see progress while the processes are running
- ordered results.
Thanks!