1

Tbh, I'm not really sure how to ask this question. I've got an array of values, and I'm looking to take the smoothed average of these values moving forward. In Excel, the calculation process is:

  • average_val_1 = mean average of values through window_size
  • average_val_2 = (value at location window_size+1 * window_size-1 + average_val_1) / window_size
  • average_val_3 = (value at location window_size+2 * window_size-1 + average_val_2) / window_size

etc., etc.

In pandas and numpy, my code for this is the following

df = pd.DataFrame({'av':np.nan, 'values':np.random.rand(10)})
df = df[['values','av']]

window = 5
df['av'].iloc[5] = np.mean(df['values'][:5])

for i in range(window+1,len(df.index)):
  df['av'].iloc[i] = (df['values'].iloc[i] * (window-1) + df['av'].iloc[i-1])/window

Which returns:

    values      av
0   0.418498    NaN
1   0.570326    NaN
2   0.296878    NaN
3   0.308445    NaN
4   0.127376    NaN
5   0.381160    0.344305
6   0.239725    0.260641
7   0.928491    0.794921
8   0.711632    0.728290
9   0.319791    0.401491

These are the values I am looking for, but there has to be a better way than using for loops. I think the answer has something to do with using exponentially weighted moving averages, but I'll be damned if I can figure out the syntax to make any sense of that.

Any suggestions?

RPM
  • 75
  • 1
  • 4
  • Possible duplicate of [How to calculate moving average using NumPy?](https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy). I'm also a fan of stride tricks - which this answer uses - [https://stackoverflow.com/a/39919709/2823755](https://stackoverflow.com/a/39919709/2823755) – wwii Jul 26 '18 at 18:34
  • I'm not familiar with the intricacies of the different scipy window functions, but using df.rolling() with one of the non-evenly-weighted windowing functions from https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions may be a place to start. – Silenced Temporarily Jul 26 '18 at 18:44

1 Answers1

0

you can use ewm such as:

window = 5
df['av'] = np.nan
df['av'].iloc[window] = np.mean(df['values'][:window])
df.loc[window:,'av'] = (df.loc[window:,'av'].fillna(df['values'])
                          .ewm(adjust=False, alpha=(window-1.)/window).mean())

and you get the same result than with your loop for. To be sure it works, column 'av' must be nan otherwise the fillna with column 'values' will not happen and the value calculted in 'av' will be wrong. The parameter alpha in ewm is what helps you to weigth the row you are calculating.

Note: while this code does as yours, I would recommend to have a look at this line in your code:

df['av'].iloc[5] = np.mean(df['values'][:5])

Because of the exclusion of the uppper bound when doing slicing [:5], df['values'][:5] is:

0    0.418498
1    0.570326
2    0.296878
3    0.308445
4    0.127376
Name: values, dtype: float64

so I think that what you should do is df['av'].iloc[4] = np.mean(df['values'][:5]). If you agree, then my above must be slightly changed

df['av'].iloc[window-1] = np.mean(df['values'][:window])
df.loc[window-1:,'av'] = (df.loc[window-1:,'av'].fillna(df['values'])
                            .ewm(adjust=False, alpha=(window-1.)/window).mean())
Ben.T
  • 29,160
  • 6
  • 32
  • 54