I'd like to create a new dataframe from the results of groupby
on another. The result should have one row per group (basically a vectorized map-reduce), and the new column names bear no relation to the existing names. This seems like a natural use for agg
, but it only seems to produce existing columns.
d = pd.DataFrame({'a': [0,0,1,1], 'b': [3,4,5,6], 'c': [7,8,9,0]})
a b c
0 0 3 7
1 0 4 8
2 1 5 9
3 1 6 0
agg()
will create new columns with a Series:
d.groupby('a')['b'].agg({'x': lambda g: g.sum()})
x
a
0 7
1 11
But frustratingly not with a DataFrame:
d.groupby('a').agg({'x': lambda g: g.b.sum()})
KeyError: 'x'
I can do it by returning a one-row DataFrame from apply()
:
d.groupby('a').apply(lambda g: pd.DataFrame([{'x': g.b.mean(), 'y': (g.b * g.c).sum()}])).reset_index(level=1, drop=True)
x y
a
0 3.5 53
1 5.5 45
but this is ugly and, as you can imagine, creating a new dict, list, and DataFrame for every row is slow for even modestly-sized inputs.