1

I have a big dataframe with 5 minute granularity where I'd like to find values within a time span

                     Time      X    
Date                                        
2019-09-13 06:00:00 06:00:00    1200    
2019-09-13 06:05:00 06:05:00    1250    
2019-09-13 06:10:00 06:10:00    1270    
2019-09-13 06:15:00 06:15:00    1240    
2019-09-13 06:20:00 06:20:00    1250    
2019-09-13 06:25:00 06:25:00    1230    

the goal is to find x.max() value, for instance between 06:00 - 07.00 in each day.
How would you do this?

Mark T
  • 145
  • 4
  • 12

1 Answers1

3

You can use between_time to filter the data within the hour, then resample to sum:

df.between_time('06:00:00','07:00:00')['X'].resample('D').sum()

Output:

Date
2019-09-13    7440
Freq: D, Name: X, dtype: int64
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Nice answer, can you tell if both values ​​are inclusive? – Terry Jun 18 '20 at 16:24
  • 1
    @Terry yes. `between_time` has default `include_start=True` and `include_end=True`. – Quang Hoang Jun 18 '20 at 16:26
  • `df.index.dayofweek` will help you check for weekend. For business day, check out [this question](https://stackoverflow.com/questions/13019719/get-business-days-between-start-and-end-date-using-pandas). – Quang Hoang Jun 18 '20 at 17:31
  • The problem with this solution is that resample creates continuous dates + hours which are not in the original df. I used your solution with max(): `df.between_time('08:00:00','09:00:00')['X'].resample('H').max()`, but I get all hours from 08:00 to 00:00 to 00:00 listed – Mark T Jun 18 '20 at 18:32