I've got a dataframe with dates, and I want to select the highest date in each week excluding weekends (so Fridays, if available), unless there is no Monday to Friday data and Saturday/Sunday are the only one's available.
The sample data can be setup like this:
dates = pd.Series(data=['2018-11-05', '2018-11-06', '2018-11-07', '2018-11-08', '2018-11-09',
'2018-11-12', '2018-11-13', '2018-11-14', '2018-11-15', '2018-11-17',
'2018-11-19',
'2018-12-01',
])
nums = np.random.randint(50, 100, 12)
# nums
# array([95, 80, 81, 51, 98, 62, 50, 55, 59, 77, 69])
df = pd.DataFrame(data={'dates': dates, 'nums': nums})
df['dates'] = pd.to_datetime(df['dates'])
The records I want:
- 2018-11-09 is Friday
- 2018-11-15 is Thursday (not 2018-11-17 coz it's Saturday)
- 2018-11-19 is Monday and the only record for that week
- 2018-12-01 is Saturday but the only record for that week
My current solution is in the answer below but I don't think it's ideal and has some issues I had to work around. Briefly, it's:
- groupby week:
df.groupby(df['dates'].dt.week).apply(some_function)
- if there's just one record for that week, return it
- otherwise, select the highest/latest record with day <= Friday and return that
Ideally, I'd like a way to write:
[latest Mon-Fri record] if [has Mon-Fri record] else [latest Sat-Sun record]