I recently started to use multiprocessing
when mapping some complex functions on pandas dataframe. For example if I want to create a new column based on a value of some other column, I could do:
import seaborn as sns
iris = sns.load_dataset('iris')
import multiprocessing as mp
#example of a "complex function" returning some array
def function_1(val_):
return [1] * round(val_)
with mp.Pool(mp.cpu_count()) as pool:
iris['test_1'] = pool.map(function_1, iris['petal_length'])
This is much faster than using just apply
with lambda function
.
If I have a function which takes as an input multiple other columns of a dataframe (plus even some parameters), I could normally apply it like this:
def function_2(val_1, val_2, param_):
return [param_] * round(val_1 + val_2)
iris['test_2'] = iris.apply(lambda x: function_2(x['petal_length'], x['sepal_width'], 3), axis=1)
How can I use multiprocessing
for function_2
which takes more inputs than 1?