Is there a good way to do "moving" calculations on numpy arrays?

Question

I am working on a data processing project in which I would generally like to take a 1D numpy array as input, and output an equal length array who's elements were generated by processing a certain number of input elements. This is a relatively simple problem to solve using a for loop, but I am wondering if numpy has a built in way of doing this, which I assume would be significantly faster.

To illustrate my goals, imagine generating a vector (B) 1 element at a time, and let the current element being generated be element N (denoted B[N]).

Say I want B to be a vector whose elements correspond to a simple moving average of the elements in vector A. What I want to be able to say is

B[i] = AVG(A[(i-N):i]) #N <= i < len(A)

Where i here is the iteration index of whatever underlying loop is running and AVG is a generic function which calculates the average of the group of numbers passed to it.

As I said, easy enough with a for loop, but this seems like a thing something like numpy should be able to do quite easily so I thought I'd ask the pros before I litter my code with less than optimal structures.

Unfortunately, this would require a specific example to answer. — Mad Physicist, Jun 13 '17 at 17:20
@J.C.Leitão. That question has nothing to do with this one. OP is trying to step away from using raw Python and use numpy instead. — Mad Physicist, Jun 13 '17 at 17:21
Have you played with functions like `np.cumsum` and `np.cumprod`? — hpaulj, Jun 13 '17 at 17:24
@MadPhysicist Is the pseudo code not specific enough? I believe I was quite thorough in explanation of my goals. I'm only wondering if numpy has a built in way of doing what I did in the sudo code. — Josh Wiens, Jun 13 '17 at 17:25
@hpaulj I have used cumsum for some applications but trying to use it with things that measure movement information (such as standard deviation) I think it doesn't really work. — Josh Wiens, Jun 13 '17 at 17:29
Meant to say "pseudo" (noticed after 5 minutes), the terminal is getting to me. — Josh Wiens, Jun 13 '17 at 17:37
If that operation is expressible as NumPy ufuncs, you could create sliding windows, like shown [`here`](https://stackoverflow.com/a/40085052/3293881) and then use those ufuncs along the last axis of the 2D array of sliding windows. If there's a specific operation that you are after, we could find better optimized ways. — Divakar, Jun 13 '17 at 17:40
@JoshWiens. My fault entirely. I misread: forgot to read the end of your sentence "which calculates the average of the group of numbers passed to it". I thought AVG was a placeholder for *any* function at all. — Mad Physicist, Jun 13 '17 at 18:13
Couple of dupes: https://stackoverflow.com/q/14313510/2988730, https://stackoverflow.com/q/13728392/2988730 — Mad Physicist, Jun 13 '17 at 18:17
By the way, did you mean `A[i-N:i+1]` in your question? The stop index is is exclusive. — Mad Physicist, Jun 13 '17 at 18:22
possible duplicate of https://stackoverflow.com/q/13728392/52074 — Trevor Boyd Smith, Apr 24 '19 at 13:23

score 1 · Accepted Answer · answered Jun 13 '17 at 17:25

1

Check out standard moving window functions in Pandas. For example the moving average with window size 10 will be pd.rolling_mean(data, window=10).

You can also provide your own aggregation function with pd.rolling_apply(data, lambda x: np.mean(x), window=10), which is the same as the previous one.

answered Jun 13 '17 at 17:25

Bubble Bubble Bubble Gut

3,280
12
28

1

While Pandas is definitely not off the table, right now I'm really specifically wondering about numpy. That does look like exactly what I want to do though in the end, thanks for the tip! – Josh Wiens Jun 13 '17 at 17:32
I am accepting this answer despite it not quite adhering to the initial question. @Mad Physicist's answer is a good solution for the specific question asked. But given a project where you would like apply a multitude of different process algorithms, this solution is drastically more applicable. Pandas is worth using for this function alone imo. – Josh Wiens Jun 13 '17 at 22:24

score 1 · Answer 2 · answered Jun 13 '17 at 17:25

A bit low-level, but you can filter the data by cross-correlating it with a window of your choosing. A moving average window is a bunch of ones divided by however many ones there are. Note that the correlate has various "modes" and the validity of the start/end points vary.

import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt

window_size = 10
window = np.ones(window_size) / window_size
x = np.random.rand(100)

x_filt = signal.correlate(x, window, mode='same')

f, ax = plt.subplots()
ax.plot(x)
ax.plot(x_filt)

Mad Physicist · Answer 3 · 2017-06-13T18:35:42.710

The simplest pure numpy solution that does not use convolution is the one using np.cumsum. The basic idea is that the sum of elements from index i - N to index i (both inclusive) is the cumulative sum up to i, minus the cumulative sum up to i - N - 1. The normalization is just N itself:

s = np.cumsum(A)
B = (s[N:] - s[:-N]) / N

It is not clear if you want B to be the same length as A. If so, you could, for example, prepend the first N values of the cumulative sum to B using np.concatenate or np.r_:

B = np.concatenate((s[:N] / np.arange(N), (s[N:] - s[:-N]) / N))

OR

B = np.r_[s[:N] / np.arange(N), (s[N:] - s[:-N]) / N]

After writing this, I realized that @Jaime has a very similar answer to basically the same question here. I am going to retain my answer because it correctly normalizes the initial portion of the array, which I am not convinced that Jaime's answer does.

Is there a good way to do "moving" calculations on numpy arrays?

3 Answers3