I would like to make multiple .groupby()
operations on different subsets of a given dataset and bind them all together. For example:
import pandas as pd
df = pd.DataFrame({"ID":[1,1,2,2,2,3],"Subset":[1,1,2,2,2,3],"Value":[5,7,4,1,7,8]})
print(df)
ID Subset Value
0 1 1 5
1 1 1 7
2 2 2 4
3 2 2 1
4 2 2 7
5 3 1 9
I would then like to concatenate the following objects and store the result in a pandas data frame:
gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"]).mean()
gr2 = df[df["Subset"] == 2].groupby(["ID","Subset"]).mean()
# Why do gr1 and gr2 have column names in different rows?
I realize that df.groupby(["ID","Subset"]).mean()
would give me the concatenated object I'm looking for. Just bear with me, this is a reduced example of what I'm actually dealing with.
I think the solution could be to transform gr1
and gr2
to pandas data frames and then concatenate them like I normally would.
In essence, my questions are the following:
- How do I convert a
groupby
result to a data frame object? - In case this can be done without transforming the series to data frames, how do you bind two
groupby
results together and then transform that to a pandas data frame?
PS: I come from an R background, so to me it's odd to group a data frame by something and have the output return as a different type of object (series or multi index data frame). This is part of my question too: why does .groupby
return a series? What kind of series is this? How come a series can have multiple columns and an index?