43

When grouping a Pandas DataFrame, when should I use transform and when should I use aggregate? How do they differ with respect to their application in practice and which one do you consider more important?

piRSquared
  • 285,575
  • 57
  • 475
  • 624
Sylvi0202
  • 901
  • 2
  • 9
  • 13

1 Answers1

82

consider the dataframe df

df = pd.DataFrame(dict(A=list('aabb'), B=[1, 2, 3, 4], C=[0, 9, 0, 9]))

enter image description here


groupby is the standard use aggregater

df.groupby('A').mean()

enter image description here


maybe you want these values broadcast across the whole group and return something with the same index as what you started with.
use transform

df.groupby('A').transform('mean')

enter image description here

df.set_index('A').groupby(level='A').transform('mean')

enter image description here


agg is used when you have specific things you want to run for different columns or more than one thing run on the same column.

df.groupby('A').agg(['mean', 'std'])

enter image description here

df.groupby('A').agg(dict(B='sum', C=['mean', 'prod']))

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624