-1

I'm referring to this post where one custom lambda function is applied to one specific column during the aggregate step while grouping.

In [67]: f = {'A':['sum','mean'], 'B':['prod'], 'D': lambda g: df.ix[g.index].E.sum()}

In [69]: df.groupby('GRP').agg(f)
Out[69]:
            A                   B         D
          sum      mean      prod  <lambda>
GRP
0    0.719580  0.359790  0.102004  1.170219
1    0.454824  0.227412  0.034060  1.182901

I'm interested if this is possible without specifying column A and B explicitly. I'm looking for a way where a standard method like sum or mean is applied to all columns except column D and the custom lambda function is applied to column D only.

Something like this:

f = {'sum', 'D': lambda g: df.ix[g.index].E.sum()}
Matthias
  • 5,574
  • 8
  • 61
  • 121

1 Answers1

2

If it's for all columns, just create a simple list comprhension, and then update D, that should be good enough.

# default aggregate function for all columns    
f = {x : 'sum' for x in df.columns}

# override custom function for specific column
f['D'] = lambda g: df.loc[g.index].E.sum()

# apply aggregate to groupby
df.groupby('GRP').agg(f).reindex(columns=df.columns)


            A         B         C         D         E  GRP
GRP                                                       
0    0.719581  0.684144  0.707859  1.170218  1.170218    0
1    0.454824  0.523983  0.181761  1.182902  1.182902    2
Matthias
  • 5,574
  • 8
  • 61
  • 121
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Hmm, thought about that solution too. Currently I'm doing two groupby-agg operations and join the two results based on the index. but your solution should be better in terms of performance. – Matthias Nov 01 '17 at 11:00
  • Yep, it's much simpler, .... if you want the order restored, just call `reindex`. Edited my answer to include that. – cs95 Nov 01 '17 at 11:02