Concatenate multiple pandas groupby outputs

Question

I would like to make multiple .groupby() operations on different subsets of a given dataset and bind them all together. For example:

import pandas as pd
df = pd.DataFrame({"ID":[1,1,2,2,2,3],"Subset":[1,1,2,2,2,3],"Value":[5,7,4,1,7,8]})
print(df)
   ID  Subset  Value
0   1       1      5
1   1       1      7
2   2       2      4
3   2       2      1
4   2       2      7
5   3       1      9

I would then like to concatenate the following objects and store the result in a pandas data frame:

gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"]).mean()
gr2 = df[df["Subset"] == 2].groupby(["ID","Subset"]).mean()
# Why do gr1 and gr2 have column names in different rows?

I realize that df.groupby(["ID","Subset"]).mean() would give me the concatenated object I'm looking for. Just bear with me, this is a reduced example of what I'm actually dealing with.

I think the solution could be to transform gr1 and gr2 to pandas data frames and then concatenate them like I normally would.

In essence, my questions are the following:

How do I convert a groupby result to a data frame object?
In case this can be done without transforming the series to data frames, how do you bind two groupby results together and then transform that to a pandas data frame?

PS: I come from an R background, so to me it's odd to group a data frame by something and have the output return as a different type of object (series or multi index data frame). This is part of my question too: why does .groupby return a series? What kind of series is this? How come a series can have multiple columns and an index?

If you use agg operation after group by it returns a dataframe object — Parijat Bhatt, Aug 23 '19 at 18:39
On my pandas 0.25, `gr1` and `gr2` are dataframes. What version are you on? You can convert them like this `gr1.to_frame()` — Code Different, Aug 23 '19 at 18:46
@CodeDifferent You are right. `gr1` and `gr2` are pandas data frames, but the column names are on different rows. Why is this? — Arturo Sbr, Aug 23 '19 at 19:54

score 1 · Accepted Answer · answered Aug 24 '19 at 10:33

The return type in your example is a pandas MultiIndex object. To return a dataframe with a single transformation function for a single value, then you can use the following. Note the inclusion of as_index=False.

>>> gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"], as_index=False).mean()
>>> gr1

    ID  Subset  Value
0    1       1      6

This however won't work if you wish to aggregate multiple functions like here. If you wish to avoid using df.groupby(["ID","Subset"]).mean(), then you can use the following for your example.

>>> gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"], as_index=False).mean()
>>> gr2 = df[df["Subset"] == 2].groupby(["ID","Subset"], as_index=False).mean()

>>> pd.concat([gr1, gr2]).reset_index(drop=True)

   ID   Subset  Value
0   1        1      6
1   2        2      4

If you're only concerned with dealing with a specific subset of rows, the following could be applicable, since it removes the necessity to concatenate results.

>>> values = [1,2]
>>> df[df['Subset'].isin(values)].groupby(["ID","Subset"], as_index=False).mean()

    ID  Subset  Value
0   1        1      6
1   2        2      4

Concatenate multiple pandas groupby outputs

1 Answers1