1

I have the following df,

id    year_month    amount
10    201901        10
10    201901        20
10    201901        30
20    201902        40
20    201902        20

I want to groupby id and year-month and then get the group size and sum of amount,

df.groupby(['id', 'year_month'], as_index=False)['amount'].sum()

df.groupby(['id', 'year_month'], as_index=False).size().reset_index(name='count')

I am wondering how to do it at the same time in one line;

id    year_month    amount    count
10    201901        60        3
20    201902        60        2
Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
daiyue
  • 7,196
  • 25
  • 82
  • 149
  • 1
    `df.groupby(['id', 'year_month'])['amount'].agg(['sum','size']).reset_index()` do it with `.agg()` – anky Jul 17 '19 at 09:44
  • @anky_91 whats the difference between `size` and `count` for `agg`? – daiyue Jul 17 '19 at 09:50
  • Actually [this](https://stackoverflow.com/questions/33346591/what-is-the-difference-between-size-and-count-in-pandas) is a good read for that question. :) – anky Jul 17 '19 at 09:53

1 Answers1

5

Use agg:

df.groupby(['id', 'year_month']).agg({'amount': ['count', 'sum']})


                    amount
                   count    sum
id  year_month      
10  201901          3       60
20  201902          2       60

If you want to remove the multi-index, use MultiIndex.droplevel:

s = df.groupby(['id', 'year_month']).agg({'amount': ['count', 'sum']}).rename(columns ={'sum': 'amount'})
s.columns = s.columns.droplevel(level=0)
s.reset_index()

    id  year_month  count   amount
0   10  201901        3      60
1   20  201902        2      60
Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45