I have written some code to parallelize the processing of some data in a Jupyter notebook.
It consists of a function taking some data as input, transforming them and writing the result in a file:
%%writefile my_functions.py
import pickle
def my_function(f):
d = f*10
with open(f"{v}.p", "wb") as f:
pickle.dump(d, f, pickle.HIGHEST_PROTOCOL)
The function is called in the main:
from multiprocess import Pool
from my_functions import my_function
from tqdm import tqdm
values_list = [0, 1, 2, 3, 4, 5, 6]
max_pool = 5
factor=10
with Pool(max_pool) as p:
pool_outputs = list(
tqdm(
p.imap(my_function,
values_list),
total=len(values_list)
)
)
How can I modify the code in order to pass some variables to my_function? For example, let's suppose I want to pass the value of a variable v:
%%writefile my_functions.py
import pickle
def my_function(f,v):
d = f*v
with open(f"{v}.p", "wb") as f:
pickle.dump(d, f, pickle.HIGHEST_PROTOCOL)
How can I modify the call to p.imap accordingly?
Similarly to other solutions for multiprocessing
(e.g. this one), I tried to use p.imap(my_function, zip(values_list, repeat(factor)))
or p.imap(my_function(factor), values_list)
but they did not work.
Note: I am not bound to using multiprocess. If you know solutions using other packages, I am a taker.