I have a data set of timestamps & values in pandas. The interval between timestamps is ~12 seconds over a total timespan of roughly one year but sometimes there are missing points (i.e., the time series is irregular so I can't use fixed window sizes).
I want to compute the rolling averages of the values over 1 minute intervals but I'm not getting the behavior that I expected. I found a similar issue here but that was using the sum and also pre-pandas 0.19.0. I am using Pandas 0.23.4.
Sample Data
Time, X
2018-02-02 21:27:00, 75.4356
2018-02-02 21:27:12, 78.29821
2018-02-02 21:27:24, 73.098345
2018-02-02 21:27:36, 78.3331
2018-02-02 21:28:00, 79.111
Note that 2018-02-02 21:27:48
is missing.
For a rolling sum, I could just fill the missing values with 0s but for the mean, I don't want the missing points being counted as real data points (for example, I want the window to be sum(data points over given interval) / number of data points in interval).
I'd like to do it for varying segments of time (i.e., 1min, 5min, 15min, 1hr, etc).
The closest I got to getting actual values was to do:
Code
df['rolling_avg']=df.rolling('1T',on='Time').X.mean()
My understanding is that would be the 1 minute rolling averages.
But then, I'm not sure how to interpret the output. I would have expected NaNs for the first 1+1 minute since there is nothing to base the rolled average on but instead I have values.
Output
Time X rolling_avg
0 2018-02-02 21:27:00 75.4356 75.435600
1 2018-02-02 21:27:12 78.29821 76.866905
2 2018-02-02 21:27:24 73.098345 75.610718
3 2018-02-02 21:27:36 78.3331 76.291314
4 2018-02-02 21:28:00 79.111 77.210164
Basically, in this output, df[1].rolling_avg
is (Value[0]+Value[1])/2
, though the interval was 12 seconds, not 1 minute.
Is there a way to do what I am trying to do or do I need to write a for-loop to do this manually?