0

I have a Pandas Dataframe with 5 minute data over the course of days. A sample of the data looks like

timestamp,name,value
2020-04-30 00:00:00,a,21.1018
2020-04-30 00:05:00,a,-3.7804
2020-04-30 00:10:00,a,2.6110
2020-04-30 00:15:00,a,-20.0046
2020-04-30 00:20:00,a,-21.7144
2020-04-30 00:25:00,a,22.2500
2020-04-30 00:30:00,a,16.9082
2020-04-30 00:35:00,a,14.8040
2020-04-30 00:40:00,a,7.3906
2020-04-30 00:45:00,a,97.7612
2020-04-30 00:50:00,a,6.0274
2020-04-30 00:55:00,a,24.4248
2020-04-30 01:00:00,a,173.8800
2020-04-30 01:05:00,a,155.7417

What I would like to be able to do is take the data from 00:05 to 01:00 and get the mean of it and do the same for all such intervals in the day, essentially yyyy-mm-dd hh:05 to yyyy-mm-dd (hh + 1):00. I can not assume that the dataframe's data will start or end at a specific time point, but can guarantee 5 minute intervals. This does not have to be done with Pandas if there is another way.

The expected output for the data given would be 26.71, so selecting the time interval between 00:05 to 01:00 and taking the mean of the values. I would then repeat this for the entire day.

usr1
  • 13
  • 2
  • 1
    kindly post expected output – sammywemmy Jun 05 '20 at 00:57
  • 1
    Can you clarify what exactly the issue is? – AMC Jun 05 '20 at 01:04
  • 1
    This is ambiguous. What if there is data with timestamp `... hh:01`? Actually, my best guess is that you want such t that `... hh:00 < t <= ... (hh+1):00` or `... hh:05 <= t < ... (hh+1):05` – InQβ Jun 05 '20 at 01:07
  • @no1xsyzy I can assume 5 minute intervals just not the starting time or ending time in the data, but to be more general ```... hh:05 <= t < ... (hh+1):05 ``` would be correct as well. – usr1 Jun 05 '20 at 01:25
  • @AMC the main issue is a good way to select the time frame described, for an entire days worth of data, given the data can have an arbitrary starting and ending 5 minute interval. – usr1 Jun 05 '20 at 01:32
  • @usr1 Are you looking for something like [Select DataFrame rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) ? What do you mean by _given the data can have an arbitrary starting and ending 5 minute interval_ ? – AMC Jun 05 '20 at 01:34
  • @AMC arbitrary in that my data may start or end at ```... hh:10``` or some other 5 minute interval, but I only want to select the values in the complete intervals of ```... hh:05 <= t < ... (hh+1):05``` for all timestamps in my dataframe. – usr1 Jun 05 '20 at 01:39
  • You should be able to use the basic date/time handling functionality offered by Pandas for this, no? In the example in your post, why is the time range `00:05` -> `01:00` ? – AMC Jun 05 '20 at 01:42

1 Answers1

0

You can do an offset:

# convert to datetime if not already is
df['timestamp'] = pd.to_datetime(df['timestamp'])

offset = pd.to_timedelta('5M')
df.groupby(df['timestamp'].sub(offset).dt.floor('H')).mean()

Output:

                         value
timestamp                     
2020-04-29 23:00:00   21.10180
2020-04-30 00:00:00   26.71315
2020-04-30 01:00:00  155.74170
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74