I have a DatetimeIndex like
>>> missing = pd.date_range(start='2021-05-21 19:00', end = '2021-05-24 01:15', freq = '15min')
that is:
>>> missing
DatetimeIndex(['2021-05-21 19:00:00', '2021-05-21 19:15:00',
'2021-05-21 19:30:00', '2021-05-21 19:45:00',
'2021-05-21 20:00:00', '2021-05-21 20:15:00',
'2021-05-21 20:30:00', '2021-05-21 20:45:00',
'2021-05-21 21:00:00', '2021-05-21 21:15:00',
...
'2021-05-23 23:00:00', '2021-05-23 23:15:00',
'2021-05-23 23:30:00', '2021-05-23 23:45:00',
'2021-05-24 00:00:00', '2021-05-24 00:15:00',
'2021-05-24 00:30:00', '2021-05-24 00:45:00',
'2021-05-24 01:00:00', '2021-05-24 01:15:00'],
dtype='datetime64[ns]', length=218, freq='15T')
The problem
I am only interested in missing data if ( the day is between monday and thursday ) or ( the day is friday and the time is less than 21:45 ) or ( the day is sunday and the time is greater than 22:00 ).
To accomplish this filtering I do:
>>> missing_weekday = missing.weekday
>>> missing_time = missing.time
I create a mask in this way:
>>> missing_mask = (missing_weekday < 4) | ( (missing_weekday == 4) & (missing_time <= datetime.time(21, 45)) ) | ( (missing_weekday == 6) & (missing_time >= datetime.time(22, 00)) )
and finally have the filtered DatetimeIndex:
>>> missing[missing_mask]
DatetimeIndex(['2021-05-21 19:00:00', '2021-05-21 19:15:00',
'2021-05-21 19:30:00', '2021-05-21 19:45:00',
'2021-05-21 20:00:00', '2021-05-21 20:15:00',
'2021-05-21 20:30:00', '2021-05-21 20:45:00',
'2021-05-21 21:00:00', '2021-05-21 21:15:00',
'2021-05-21 21:30:00', '2021-05-21 21:45:00',
'2021-05-23 22:00:00', '2021-05-23 22:15:00',
'2021-05-23 22:30:00', '2021-05-23 22:45:00',
'2021-05-23 23:00:00', '2021-05-23 23:15:00',
'2021-05-23 23:30:00', '2021-05-23 23:45:00',
'2021-05-24 00:00:00', '2021-05-24 00:15:00',
'2021-05-24 00:30:00', '2021-05-24 00:45:00',
'2021-05-24 01:00:00', '2021-05-24 01:15:00'],
dtype='datetime64[ns]', freq=None)
This being said, my questions are:
- I feel like the missing_mask is way too twisted and may slow down the calculation of
missing[missing_mask]
, when missing is a much bigger DatetimeIndex like for example
>>> missing = pd.date_range(start='1900-05-21 19:00', end = '2021-05-24 01:15', freq = '15min')
Could it be so?
- If yes, how could I simplify the mask if possible?