Taking the second answer on this post, I have tried the following code
from multiprocessing import Pool
import numpy as np
from itertools import repeat
import pandas as pd
def doubler(number, r):
result = number * 2 + r
return result
def f1():
return np.random.randint(20)
if __name__ == '__main__':
df = pd.DataFrame({"A": [10,20,30,40,50,60], "B": [-1,-2,-3,-4,-5,-6]})
num_chunks = 3
# break df into 3 chunks
chunks_dict = {i:np.array_split(df, num_chunks)[i] for i in range(num_chunks)}
arg1 = f1()
with Pool() as pool:
results = pool.starmap(doubler, [zip(chunks_dict[i]['B'], repeat(arg1)) for i in range(num_chunks)])
print(results)
>>> [(-1, 20, -1, 20, -2, 20), (-3, 20, -3, 20, -4, 20), (-5, 20, -5, 20, -6, 20)]
This is not the results I want. What I want is to feed each element of column B
of df
into the doubler
function, as well as the output from f1
- this is why I am using starmap
and repeat
- to get a list output of the input doubled and some random integer added to it.
For example, if the output of f1
was 2, then I want to return
>>> [0,-2,-4,-6,-8,-10] # [2*(-1) + 2, 2*(-2) + 2, ... ]
Can anyone advise how I would achieve this desired result? Thanks
EDIT: Inserting the whole data frame does not work either:
with Pool() as pool:
results = pool.starmap(doubler, [zip(df['B'], repeat(arg1))])
>>> TypeError: doubler() takes 2 positional arguments but 6 were given
Essentially, I just want to break up my dataframe into chunks, and give these chunks, as well as other variables (arg1) into a function that accepts more than one argument.