I would like to parallelize a function using ‘multiprocessing’. My function takes two arguments as input: a data frame and a list of some values.
The values in the list are searched in the data frame, and if found in the data frame, the found-row is saved to another data frame to be returned. The function is supposed to receive the data frame intact without splitting it among the number of processes but split the list among the number of processors. How can I pass two arguments to a function using multiprocessing while splitting one argument among the number of processors and keeping the other argument intact?
import pandas as pd
import multiprocessing as mp
def afunction(df, alist):
#something
return another_df
pool = mp.Pool(processes=4)
results = pool.map(afunction, args =(df, alist))