I found this question on parallelizing groupby. However, it can't be translated one-to-one into the case where there's multiple arguments - unless I'm mistaken.
Is the following a correct way of doing it? Is there a better way? (Especially getting the index appeared quite inefficient).
def applyParallel(dfGrouped, func, *args):
with Pool(cpu_count() - 2) as p:
ret_list = p.starmap(func, zip([group for name, group in dfGrouped], repeat(*args)))
index = [name for name, group in dfGrouped]
return pd.Series(index=index, data=ret_list)
which one would call using applyParallel(df.groupby(foo), someFunc, someArgs)
.