1

I am trying to group by on a timeseries dataset so that I can find most frequent day of week, week of month etc.

My dataset looks something like this:

ID Date      
1  2020-01-02
1  2020-01-09
1  2020-01-08

My output dataset should look something like this:

ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A 
1  4                  2

(Here Thursday is the mode day of week, and the 2nd week is mode week of month for given dates) So essentially trying to find out the mode(most frequent) day of week and mode(most frequent) week of month for each ID. Any idea how to achieve this in Python? The dataset contains multiple such IDs, this is just example of 1 such ID, the dataset has multiple IDs with similar timestamp data.

sinha-shaurya
  • 549
  • 5
  • 15

3 Answers3

2

You can group by ID using .groupby() and then aggregate using .agg() with named aggregation to set new column names for the 2 source columns dayofweek and week. Get mode by pd.Series.mode

df = pd.DataFrame({"ID":[1,1,1,2,2,2],
                   "Date": ["2020-01-02", "2020-01-09", "2020-01-08", "2020-01-04", "2020-01-21", "2020-01-22"] })
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = df.Date.dt.isocalendar().week

df_out = (df.groupby('ID', as_index=False)[['dayofweek', 'week']]
            .agg(Pref_Day_Of_Week_A=('dayofweek', lambda x:  pd.Series.mode(x)[0]), 
                 Pref_Week_Of_Month_A=('week', lambda x:  pd.Series.mode(x)[0]))
         )

Result:

print(df_out)

   ID  Pref_Day_Of_Week_A  Pref_Week_Of_Month_A
0   1                   3                     2
1   2                   1                     4
SeaBean
  • 22,547
  • 3
  • 13
  • 25
2

Use custom lambda function with select first mode by Series.mode and Series.iat in named aggregation by GroupBy.agg:

df = pd.DataFrame({"ID":[1,1,1,2,2,2],
                   "Date": ["2020-01-02", "2020-01-09", "2020-01-08"]*2})

#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
    """Get week number in a month.
    
    Parameters: 
        dates (pd.Series): Series of dates.
    Returns: 
        pd.Series: Week number in a month.
    """
    firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
    return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
    

df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])

f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f), 
                                            Pref_Week_Of_Month_A=('week',f)))
print (df1)
   ID  Pref_Day_Of_Week_A  Pref_Week_Of_Month_A
0   1                   3                     2
1   2                   3                     2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This gives the week of year , not the week of the month. Apart from that it works just fine – sinha-shaurya Jul 12 '21 at 10:46
  • @sinha-shaurya - Can you check [this](https://stackoverflow.com/a/64192858/2901002) for `df['week'] =weekinmonth(df['Date'])` ? – jezrael Jul 12 '21 at 10:52
1
import pandas as pd
df = pd.DataFrame({"ID":[1,1,1], "Date": ["2020-01-02", "2020-01-09", "2020-01-08"]})
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = df.Date.dt.isocalendar().week
df[['ID', 'dayofweek', 'week']].mode()

output:

   ID  dayofweek  week
0   1          3     2

PS: dayofweek assumes the week starts on Monday, which is denoted by 0 and ends on Sunday which is denoted by 6. which makes Thursday day number 3

Mutaz-MSFT
  • 756
  • 5
  • 20
  • I think I should have mentioned that the dataset contains multiple IDs . So a groupby operation will be needed at some point – sinha-shaurya Jul 12 '21 at 10:03
  • for mode with group by use: df[['ID', 'dayofweek', 'week']].groupby('ID').apply(pd.DataFrame.mode) – Mutaz-MSFT Jul 12 '21 at 10:08
  • df=df.sort_values(by='Date') df2=df.groupby('ID').agg( MODE_DAY_OF_WEEK=('Date',lambda x: stats.mode(x.dt.dayofweek)[0])) I tried this . Is it correct though?It is difficult to manually verify this(stats is from scipy) – sinha-shaurya Jul 12 '21 at 10:14