7

Beginner question. This seems like it should be a straightforward operation, but I can't figure it out from reading the docs.

I have a df with this structure:

|integer_id|int_field_1|int_field_2|

The integer_id column is non-unique, so I'd like to group the df by integer_id and sum the two fields.

The equivalent SQL is:

SELECT integer_id, SUM(int_field_1), SUM(int_field_2) FROM tbl
GROUP BY integer_id

Any suggestions on the simplest way to do this?

EDIT: Including input/output

Input:  
integer_id  int_field_1 int_field_2   
2656        36          36  
2656        36          36  
9702        2           2  
9702        1           1  

Ouput using df.groupby('integer_id').sum():

integer_id  int_field_1 int_field_2  
2656        72          72  
9702        3           3  
Hugo
  • 1,106
  • 15
  • 25
acpigeon
  • 1,699
  • 9
  • 20
  • 30

3 Answers3

11

You just need to call sum on a groupby object:

df.groupby('integer_id').sum()

See the docs for further examples

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Just to add to this a bit, since my situation was slightly more complicated: if you want to group by mutiple fields the only difference is `df.groupby(['field1', 'field2']).sum()` – Herms Jun 08 '20 at 15:29
1

You can do it

data.groupby(by=['account_ID'])['purchases'].sum()
Bastin Robin
  • 907
  • 16
  • 30
0

A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.

df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})

using your values...

df.groupby(['integer_id']).agg({'int_field_1': "sum", 'int_field_2': "sum" })
xxyjoel
  • 551
  • 5
  • 7