I am doing some repetitive tasks on high number of files, such that I'd like to run these tasks in parallel.
Each task is in a function that looks like :
def function(file):
...
return var1, var2, ...
And I manage to run all of this in parallel using :
import concurrent.futures as Cfut
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, file) for file in list_files]
Cfut.wait(futures)
What I want to do is:
- Finding a way to get var1, var2, var3 back in another variable.
- Writing a function that does all the parallelizing process
- As each task is very quick on its own, doing groups of workers.
Here is what I wrote for the moment :
def function(files):
for file in files:
...
print('var1', 'var2', ...)
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
If I just print var1, var2, etc .., it is working but I need to get these results into an array or something.