When grouping a Pandas DataFrame, when should I use transform
and when should I use aggregate
? How do
they differ with respect to their application in practice and which one do you
consider more important?
Asked
Active
Viewed 1.4k times
43

piRSquared
- 285,575
- 57
- 475
- 624

Sylvi0202
- 901
- 2
- 9
- 13
1 Answers
82
consider the dataframe df
df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9]))
groupby
is the standard use aggregater
df.groupby('A').mean()
maybe you want these values broadcast across the whole group and return something with the same index as what you started with.
use transform
df.groupby('A').transform('mean')
df.set_index('A').groupby(level='A').transform('mean')
agg
is used when you have specific things you want to run for different columns or more than one thing run on the same column.
df.groupby('A').agg(['mean', 'std'])
df.groupby('A').agg(dict(B='sum', C=['mean', 'prod']))

piRSquared
- 285,575
- 57
- 475
- 624
-
8fabulously tremendous answer! – mathopt Jul 28 '17 at 04:24
-
2By using `agg` how can I return to original data-frame `df` exploding the aggregated columns? – MAC Aug 12 '21 at 12:12
-
@MAC To explode columns, use `transform`. – Chris Coffee Jun 24 '22 at 07:14