Is it possible to split a Pandas groupby object into smaller groupby objects?

Question

One of the great features of groupby objects in Pandas is the ability to use apply to run arbitrary functions on groups. I am trying to parallelize this using multiprocessing.

So starting out with a single groupby object, I want to:

split it into multiple groupby objects
feed them to multiprocessing.Pool workers
run groupby.apply on them
concatenate the result

Here's the dream workflow in code:

# create the initial groupby
gb = df.groupby('variable')

# split into multiple groupby's
many_groupbys = gb.split(n_chunks=10)

# now many_groupbys is a list of 10 groupby objects

# this is our transformer
def func(groupby):
    return groupby.apply(transformation)

# submit to pool
with Pool(10) as pool:
    results = pool.map(func, many_groupbys)

result = pd.concat(results)

So, is there a way to split a single groupby object into multiple groupby objects? Is there a better workflow for parallelization of computations on dataframes, where you can't arbitrarily split on rows and you care about doing processing on groups of rows?

Please note, I don't want to process groups individually, I want groupby objects.

Possible duplicate of [Parallelize apply after pandas groupby](https://stackoverflow.com/questions/26187759/parallelize-apply-after-pandas-groupby) — Lev Zakharov, Aug 15 '18 at 01:37
@LevZakharov Not a duplicate (although related), since I am not looking to process groups individually. — Andrey Portnoy, Aug 15 '18 at 01:48
Then check this [answer](https://stackoverflow.com/questions/45545110/how-do-you-parallelize-apply-on-pandas-dataframes-making-use-of-all-cores-on-o). — Lev Zakharov, Aug 15 '18 at 01:52
@LevZakharov Yeah, unfortunately Dask doesn't support multiindexed dataframes, which I really need. — Andrey Portnoy, Aug 15 '18 at 01:54

Is it possible to split a Pandas groupby object into smaller groupby objects?

0 Answers0