1

I have 2 datasets to work with:

ID Date       Amount   
1  2020-01-02 1000
1  2020-01-09 200
1  2020-01-08 400

And another dataset which tells which is most frequent day of week and most frequent week of month for each ID(there are multiple such IDs)

ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A 
1  3                 2

For this ID ,Thursday was the most frequent day of the week for ID 1 and 2nd week of the month was the most frequent week of the month.

I wish to find sum of all the amounts that took place on the most frequent day of week and frequent week of month, for all IDs(hence requiring groupby):

ID Amount_On_Pref_Day Amount_Pref_Week
1  1200               600

I would really appreciate it if anyone could help me calculating this dataframe using pandas. For reference, I have used this function to find the week of month for a given date:

#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
    """Get week number in a month.
    
    Parameters: 
        dates (pd.Series): Series of dates.
    Returns: 
        pd.Series: Week number in a month.
    """
    firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
    return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
sinha-shaurya
  • 549
  • 5
  • 15

1 Answers1

1

Idea is filter only matched dayofweek and week and aggregate sum, last join together by concat:

#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
    """Get week number in a month.
    
    Parameters: 
        dates (pd.Series): Series of dates.
    Returns: 
        pd.Series: Week number in a month.
    """
    firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
    return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1



df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])

f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f), 
                                            Pref_Week_Of_Month_A=('week',f)))

s1 = df1.rename(columns={'Pref_Day_Of_Week_A':'dayofweek'}).merge(df).groupby('ID')['Amount'].sum()
s2 = df1.rename(columns={'Pref_Week_Of_Month_A':'week'}).merge(df).groupby('ID')['Amount'].sum()

df2 = pd.concat([s1, s2], axis=1, keys=('Amount_On_Pref_Day','Amount_Pref_Week'))
print (df2)
    Amount_On_Pref_Day  Amount_Pref_Week
ID                                      
1                 1200               600
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252