Loop to get rolling future values of a pandas time-indexed dataframe, can I make this faster?

Question

data['rolling_avg_val'] = 0
future_window = '1h'
for i in range(data.shape[0]):
        start_data_idx = data.index[i]
        end_data_idx = start_data_idx + pd.Timedelta(future_window)
        temp_avg = data['values'][start_data_idx:end_data_idx].mean()
        if temp_avg == 0:
            continue
        data.loc[start_data_idx,'rolling_avg_val'] = temp_avg

The dataframe is about 5,000,000 rows representing data over the course of two weeks and looks like this, where the index is timestamps in milliseconds:

2017-04-12 12:19:04.987   0
2017-04-12 12:19:05.157   1
2017-04-12 12:19:05.297   0
2017-01-12 12:19:05.330   0
2017-04-12 12:19:05.487   0
2017-04-12 12:19:05.530   1
2017-04-12 12:19:05.640   0
2017-04-12 12:19:05.703   1

As you can see, the timestamps are not evenly spaced and the end_data_idx is not necessarily in the dataframe, but pandas is able to get the proper ranges when slicing. A common solution that I see is to fill in the missing time intervals and then shift the results. However, this makes the memory explode, so I am trying to avoid that.

On my machine, it looks like it'll do about 800 rows per second. Is there another way to think about this problem to get a faster solution?

edit:

Expected output:

                          rolling_avg
2017-04-12 12:19:04.987   0.375
2017-04-12 12:19:05.157   0.429
2017-04-12 12:19:05.297   0.333
2017-01-12 12:19:05.330   0.4
2017-04-12 12:19:05.487   0.5
2017-04-12 12:19:05.530   0.667
2017-04-12 12:19:05.640   0.5
2017-04-12 12:19:05.703   1.0

I don't understand your "math" - how did you get those numbers? — MaxU - stand with Ukraine, Sep 07 '17 at 16:47
you might have seen before the edit: 3/8, 3/7, 2/6, 2/5, 2/4, 2/3, 1/2, 1/1 — user4446237, Sep 07 '17 at 16:49
Possible duplicate of [How to use Pandas rolling\_\* functions on a forward-looking basis](https://stackoverflow.com/questions/22820292/how-to-use-pandas-rolling-functions-on-a-forward-looking-basis) — Uncle Ben Ben, May 08 '18 at 13:55

score 7 · Accepted Answer · answered Sep 07 '17 at 16:51

7

reverse df

flip index sign (from monotone decreasing to increasing

rolling.mean()

flip index sign again

reverse df again

df2 = df[::-1]
df2.index = pd.datetime(2050,1,1) - df2.index
df2 = df2.rolling('1H').mean()
df3 = df2[::-1]
df3.index = df.index

answered Sep 07 '17 at 16:51

James Kang

389
1
3
8

@James Kang, for the line of code df2 = df2.rolling('1H').mean(), can i update the part ".mean()" to be a self-defined function? For example, i defined a function to find the log sum of the maximum and minimum value in each window. thanks – jester Jan 18 '19 at 03:13

Loop to get rolling future values of a pandas time-indexed dataframe, can I make this faster?

1 Answers1