0

I have a DatetimeIndex like

>>> missing = pd.date_range(start='2021-05-21 19:00', end = '2021-05-24 01:15', freq = '15min')

that is:

>>> missing
DatetimeIndex(['2021-05-21 19:00:00', '2021-05-21 19:15:00',
               '2021-05-21 19:30:00', '2021-05-21 19:45:00',
               '2021-05-21 20:00:00', '2021-05-21 20:15:00',
               '2021-05-21 20:30:00', '2021-05-21 20:45:00',
               '2021-05-21 21:00:00', '2021-05-21 21:15:00',
               ...
               '2021-05-23 23:00:00', '2021-05-23 23:15:00',
               '2021-05-23 23:30:00', '2021-05-23 23:45:00',
               '2021-05-24 00:00:00', '2021-05-24 00:15:00',
               '2021-05-24 00:30:00', '2021-05-24 00:45:00',
               '2021-05-24 01:00:00', '2021-05-24 01:15:00'],
              dtype='datetime64[ns]', length=218, freq='15T')

The problem

I am only interested in missing data if ( the day is between monday and thursday ) or ( the day is friday and the time is less than 21:45 ) or ( the day is sunday and the time is greater than 22:00 ).

To accomplish this filtering I do:

>>> missing_weekday = missing.weekday
>>> missing_time = missing.time

I create a mask in this way:

>>> missing_mask = (missing_weekday < 4) |  ( (missing_weekday == 4) & (missing_time <= datetime.time(21, 45)) ) | ( (missing_weekday == 6) & (missing_time >= datetime.time(22, 00)) )

and finally have the filtered DatetimeIndex:

>>> missing[missing_mask]
DatetimeIndex(['2021-05-21 19:00:00', '2021-05-21 19:15:00',
               '2021-05-21 19:30:00', '2021-05-21 19:45:00',
               '2021-05-21 20:00:00', '2021-05-21 20:15:00',
               '2021-05-21 20:30:00', '2021-05-21 20:45:00',
               '2021-05-21 21:00:00', '2021-05-21 21:15:00',
               '2021-05-21 21:30:00', '2021-05-21 21:45:00',
               '2021-05-23 22:00:00', '2021-05-23 22:15:00',
               '2021-05-23 22:30:00', '2021-05-23 22:45:00',
               '2021-05-23 23:00:00', '2021-05-23 23:15:00',
               '2021-05-23 23:30:00', '2021-05-23 23:45:00',
               '2021-05-24 00:00:00', '2021-05-24 00:15:00',
               '2021-05-24 00:30:00', '2021-05-24 00:45:00',
               '2021-05-24 01:00:00', '2021-05-24 01:15:00'],
              dtype='datetime64[ns]', freq=None)

This being said, my questions are:

  1. I feel like the missing_mask is way too twisted and may slow down the calculation of missing[missing_mask], when missing is a much bigger DatetimeIndex like for example
>>> missing = pd.date_range(start='1900-05-21 19:00', end = '2021-05-24 01:15', freq = '15min')

Could it be so?

  1. If yes, how could I simplify the mask if possible?
vaeVictis
  • 484
  • 1
  • 3
  • 13
  • 1
    Hi, please read this https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples and update your code so we can create similar dataframes and help you – Be Chiller Too May 29 '21 at 11:33
  • 1
    @BeChillerToo read and done what you suggested. I hope it is more understandable now. – vaeVictis May 29 '21 at 11:57
  • 2
    Well I tried your code with the big DatetimeIndex, and it ran fine, I think your code is good. – Be Chiller Too May 29 '21 at 12:01

0 Answers0