22

I have a simple time series and I am struggling to estimate the variance within a moving window. More specifically, I cannot figure some issues out relating to the way of implementing a sliding window function. For example, when using NumPy and window size = 20:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) 

rolling_window(data, 20)
np.var(rolling_window(data, 20), -1)
datavar=np.var(rolling_window(data, 20), -1)

Perhaps I am mistaken somewhere, in this line of thought. Does anyone know a straightforward way to do this? Any help/advice would be most welcome.

rAntonioH
  • 129
  • 2
  • 12
Barry
  • 377
  • 2
  • 4
  • 7

7 Answers7

29

The Pandas rolling_mean and rolling_std functions have been deprecated and replaced by a more general "rolling" framework. @elyase's example can be modified to:

import pandas as pd
import numpy as np
%matplotlib inline

# some sample data
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)).cumsum()

#plot the time series
ts.plot(style='k--')

# calculate a 60 day rolling mean and plot
ts.rolling(window=60).mean().plot(style='k')

# add the 20 day rolling standard deviation:
ts.rolling(window=20).std().plot(style='b')

The rolling function supports a number of different window types, as documented here. A number of functions can be called on the rolling object, including var and other interesting statistics (skew, kurt, quantile, etc.). I've stuck with std since the plot is on the same graph as the mean, which makes more sense unit-wise.

sfjac
  • 7,119
  • 5
  • 45
  • 69
18

You should take a look at pandas. For example:

import pandas as pd
import numpy as np

# some sample data
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)).cumsum()

#plot the time series
ts.plot(style='k--')

# calculate a 60 day rolling mean and plot
pd.rolling_mean(ts, 60).plot(style='k')

# add the 20 day rolling variance:
pd.rolling_std(ts, 20).plot(style='b')

enter image description here

elyase
  • 39,479
  • 12
  • 112
  • 119
  • 3
    I think Barry is looking for a rolling variance, not a rolling standard deviation. He can square the std to get the variance or use pd.rolling_var(ts, 20).plot(style='b'). – vlmercado May 17 '17 at 17:01
  • 1
    Now, as `pandas` get updated, the syntax changes. See [the docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.std.html#pandas.core.window.Rolling.std) for more. – StSav012 Dec 24 '19 at 17:32
14

Despite being an old thread, I'll add another method modified from this, that doesn't rely on pandas, nor python loops. Essentially, using numpy's stride tricks you can first create a view of an array with striding such that computing a statistic of the function along the last axis is equivalent to performing the rolling statistic. I've modified the original code so that the output shape is the same as the input shape by padding add the start of the last axis.

import numpy as np

def rolling_window(a, window):
    pad = np.ones(len(a.shape), dtype=np.int32)
    pad[-1] = window-1
    pad = list(zip(pad, np.zeros(len(a.shape), dtype=np.int32)))
    a = np.pad(a, pad,mode='reflect')
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(30).reshape((5,6))

# rolling mean along last axis
np.mean(rolling_window(a, 3), axis=-1)

# rolling var along last axis
np.var(rolling_window(a, 3), axis=-1)

# rolling median along last axis
np.median(rolling_window(a, 3), axis=-1)

Josh Albert
  • 1,064
  • 13
  • 16
  • 1
    Thanks for the np-only solution. Although, I need to wrap my head around the padding and striding, later. Right now, it does want I needed.. Cheers! – stevosn Apr 06 '20 at 22:00
  • Given your initial `a.shape` being `(5,6)`, why is the output of `rolling_window(a, 3)` of shape `(6, 6, 3)`? This occurs for any `a.shape` of `(n ,m)`, the output will always be `(n+1, m, window)`. Where does that extra point in the first dimension come from, and should it be there? I'm using Python 3.8.8 with NumPy 1.20.1 – Adriaan Dec 28 '21 at 08:57
5

Using Pandas for pure numerical data is a bit of an overkill in my opinion; Bottleneck works great but hasn't been updated since January 2021 and no longer works for Python 3.9 and newer; so I'll post a version based on Josh Albert's version, keeping in mind the documentation note on lib.stride_tricks.as_strided that it might be unsafe to use.

You can use NumPy's lib.stride_tricks.sliding_window_view(), which is basically a safe(ish) wrapper around lib.stride_tricks.as_strided, to create an array with an extra axis with the size of the window (in any number of dimensions), allowing you to use NumPy's built-in statistical functions to operate across that axis:

import numpy as np

window = 3  # size of the window
A = np.arange(10)

Aw = np.lib.stride_tricks.sliding_window_view(A, window)
Avar = np.var(Aw, axis=-1)

Avar
>>> array([0.66666667, 0.66666667, 0.66666667, 0.66666667, 0.66666667,
       0.66666667, 0.66666667, 0.66666667])

And of course this also works for mean, max, min, std etc.

Note: as far as I can see, there's no way to include the "edges" of the array, i.e. the beginning and end of A where the full window length cannot be attained. The resulting array will thus be shorted to that part where the full window length can be reached, see the documentation on the return.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
3

I was just looking for the same solution, and found that the bottleneck package should do the trick quite reliably and quickly. Here is slightly adjusted example from https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.move_var:

>>> import bottleneck as bn
>>> a = np.array([1.0, 2.0, 3.0, np.nan, 5.0])
>>> bn.move_var(a, window=2)
array([ nan,  0.25,  0.25,  nan,  nan])
>>> bn.move_var(a, window=2, min_count=1)
array([ 0. ,  0.25,  0.25,  0. ,  0. ])

Note that the resulting variance correspond to the last index of the window.

The package is available from Ubuntu repos, pip etc. It can operate over arbitrary axis of numpy-array etc. Besides that, it is claimed to be faster than plain-numpy implementation in many cases.

Roux
  • 435
  • 3
  • 12
  • 1
    Bottleneck works fantastic, but only on Python <3.8 alas. The dev hasn't answered any issues on Github pertaining to bugs in Python >3.9 for almost a year sadly. – Adriaan Dec 29 '21 at 10:08
0

Here's a simple way to calculate moving averages (or any other operation within a time window) using plain Python.

You may change the time window by changing the value in the window variable. For example, if you wanted a 30 minute time window, you would change the number to 3000000000.

In this example, the entries are saved in a dictionary named data. However, you may get this data from whatever collection is suitable for you.

you may save the result to whatever collection or database you like.

data = {}

def one_min_avg():
    window = int(datetime.now().strftime("%H%M%S%f")) - 100000000
    history = {}
    for i in message_log.items():
        if i[0] >= window:
            history.update({i})
    for i in list(history):
        if i < window:
            history.pop(i)
    avg = sum(history.values()) / len(list(history))
    return avg

Note: You may want to add some error handling to avoid division by zero or if the function can't access your data.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • How does the timing here compare to the other answers? One of the reasons this question comes up so often, is that a simple, naive loop is usually not very fast for this problem. – Adriaan May 30 '22 at 05:25
-1

A simple way to do (almost) any rolling/moving calculation is to do a convolution!

In this case you can convolve the standard deviation formula in your data with a set window size (ws).

moving_std = np.sqrt(np.convolve((data - np.mean(data))**2, np.ones(ws)/ws, mode='valid'))

Just don't forget that when you plot this, the first point of the moving_std will be shifted ws spaces to the left, compared to your data, because of the window size. So you need to account for that by adding the window size to your x axis.

  • 1
    I don't think this is correct. Don't you want to use the mean of the data in the window, not the whole series? If a portion of the data has a significantly different mean from the whole, won't that result in a much larger standard deviation than expected? – Mark H Jul 13 '23 at 22:15