I have a big list of images list_img
, say 20k that I need to process multiple times with changing arguments out of a list params = [arg1, arg2, ...]
. Ideally, I want to use multiple processes to do so. But I need all processes to first use arg1
and then arg2
on chunks of my list list_img
. The processing time for each arg
in params
varies greatly. So if I would distribute the list params
over my processes instead of the list of images (core 1: arg1, core 2: arg2, ...) it happens that after a while most of the processes are idle (finished) while very few are still crunching data.
My current (working) solution looks like that:
from multiprocessing import Pool
import numpy as np
def calc_image(argument, image):
val = argument * image # not the real process, just demo
return val
if __name__ == "__main__":
pool = Pool(processes=8)
list_img = [np.ones((100, 100))] * 20000 # for demo only
params = list(range(100)) # for demo only
for par in params:
par_list = [par] * len(list_img)
return_vals = pool.starmap(calc_image, zip(par_list, list_img))
pool.close()
How can I avoid to copy the list list_img
every time the variable par
changes in the for-loop? I also would like to avoid using global variables, if possible.