0

I have a dataset in this form:

Customer_key    Issue_dt      Amount
45435           2021-03-19    566
64352           2021-06-22    843
43766           2020-04-29    754
45435           2021-06-21    547

There are many repeated customer_keys for different Issue_dt. I want to groupby customer_key and get the total Amount only for year 2021. Can someone please suggest, how to do that ??

Rishavv
  • 301
  • 1
  • 6
  • filter, then apply your transform – mozway Feb 01 '22 at 09:08
  • If you think filter by `2021` then closed is correct (100% match), if need previous year from actual then it is not dupe, only similar (70% in my opinion). Can you specify it? – jezrael Feb 01 '22 at 09:18

2 Answers2

0

If need filter by year use boolean indexing and then aggregate sum:

df[df['Issue_dt'].dt.year == 2021].groupby('customer_keys', as_index=False)['Amount'].sum()

For dynamic solution get actual year and subtract 1:

y = pd.to_datetime('now').year - 1
df[df['Issue_dt'].dt.year == y].groupby('customer_keys', as_index=False)['Amount'].sum()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Use:

df = pd.DataFrame({'Customer_key':[45435,64352,43766,45435], 'Issue_dt': ['2021-03-19','2021-06-22','2020-04-29','2021-06-21'], 'Amount': [566, 843, 754, 547]})
               
df[pd.to_datetime(df['Issue_dt']).dt.year==2021].groupby('Customer_key').sum()

First filter df by year (after type conversion) then apply sum on groups.

keramat
  • 4,328
  • 6
  • 25
  • 38