0

If I have a pandas data frame df, the following three methods to calculate the mean values of the columns will give the same result:

import numpy as np
df.mean(axis = 0)
df.apply(np.mean)
df.aggregate(np.mean)

But what about if I create some groups, and use these methods in a similar way:

groups = df.groupby(by = 'A')
groups.mean()
groups.apply(np.mean)
groups.aggregate(np.mean)

...in this example .mean and .aggregate give the same result, but .apply does not. With .apply the grouped column 'A' will be returned, both as index and column (Which was not what I expected or wanted, when I came across this issue)

This behaviour seems inconsistent to me, or am I missing some fundamental difference between these 3 methods?

Martin Alexandersson
  • 1,269
  • 10
  • 12
  • Thanks, Can you send me a link to the duplicate? – Martin Alexandersson Aug 17 '18 at 15:38
  • In the groupby documentation (http://pandas.pydata.org/pandas-docs/stable/groupby.html) I found my answer. "Note apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to it. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices." – Martin Alexandersson Aug 17 '18 at 15:54
  • I don't think that this one is answered in the duplicated question... – Martin Alexandersson Aug 17 '18 at 15:55

0 Answers0