2

I am working on a data processing project in which I would generally like to take a 1D numpy array as input, and output an equal length array who's elements were generated by processing a certain number of input elements. This is a relatively simple problem to solve using a for loop, but I am wondering if numpy has a built in way of doing this, which I assume would be significantly faster.

To illustrate my goals, imagine generating a vector (B) 1 element at a time, and let the current element being generated be element N (denoted B[N]).

Say I want B to be a vector whose elements correspond to a simple moving average of the elements in vector A. What I want to be able to say is

B[i] = AVG(A[(i-N):i]) #N <= i < len(A) 

Where i here is the iteration index of whatever underlying loop is running and AVG is a generic function which calculates the average of the group of numbers passed to it.

As I said, easy enough with a for loop, but this seems like a thing something like numpy should be able to do quite easily so I thought I'd ask the pros before I litter my code with less than optimal structures.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Josh Wiens
  • 135
  • 6
  • Unfortunately, this would require a specific example to answer. – Mad Physicist Jun 13 '17 at 17:20
  • @J.C.Leitão. That question has nothing to do with this one. OP is trying to step away from using raw Python and use numpy instead. – Mad Physicist Jun 13 '17 at 17:21
  • Have you played with functions like `np.cumsum` and `np.cumprod`? – hpaulj Jun 13 '17 at 17:24
  • @MadPhysicist Is the pseudo code not specific enough? I believe I was quite thorough in explanation of my goals. I'm only wondering if numpy has a built in way of doing what I did in the sudo code. – Josh Wiens Jun 13 '17 at 17:25
  • @hpaulj I have used cumsum for some applications but trying to use it with things that measure movement information (such as standard deviation) I think it doesn't really work. – Josh Wiens Jun 13 '17 at 17:29
  • Meant to say "pseudo" (noticed after 5 minutes), the terminal is getting to me. – Josh Wiens Jun 13 '17 at 17:37
  • If that operation is expressible as NumPy ufuncs, you could create sliding windows, like shown [`here`](https://stackoverflow.com/a/40085052/3293881) and then use those ufuncs along the last axis of the 2D array of sliding windows. If there's a specific operation that you are after, we could find better optimized ways. – Divakar Jun 13 '17 at 17:40
  • @JoshWiens. My fault entirely. I misread: forgot to read the end of your sentence "which calculates the average of the group of numbers passed to it". I thought AVG was a placeholder for *any* function at all. – Mad Physicist Jun 13 '17 at 18:13
  • Couple of dupes: https://stackoverflow.com/q/14313510/2988730, https://stackoverflow.com/q/13728392/2988730 – Mad Physicist Jun 13 '17 at 18:17
  • By the way, did you mean `A[i-N:i+1]` in your question? The stop index is is exclusive. – Mad Physicist Jun 13 '17 at 18:22
  • possible duplicate of https://stackoverflow.com/q/13728392/52074 – Trevor Boyd Smith Apr 24 '19 at 13:23

3 Answers3

1

Check out standard moving window functions in Pandas. For example the moving average with window size 10 will be pd.rolling_mean(data, window=10).

You can also provide your own aggregation function with pd.rolling_apply(data, lambda x: np.mean(x), window=10), which is the same as the previous one.

  • 1
    While Pandas is definitely not off the table, right now I'm really specifically wondering about numpy. That does look like exactly what I want to do though in the end, thanks for the tip! – Josh Wiens Jun 13 '17 at 17:32
  • I am accepting this answer despite it not quite adhering to the initial question. @Mad Physicist's answer is a good solution for the specific question asked. But given a project where you would like apply a multitude of different process algorithms, this solution is drastically more applicable. Pandas is worth using for this function alone imo. – Josh Wiens Jun 13 '17 at 22:24
1

A bit low-level, but you can filter the data by cross-correlating it with a window of your choosing. A moving average window is a bunch of ones divided by however many ones there are. Note that the correlate has various "modes" and the validity of the start/end points vary.

import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt

window_size = 10
window = np.ones(window_size) / window_size
x = np.random.rand(100)

x_filt = signal.correlate(x, window, mode='same')

f, ax = plt.subplots()
ax.plot(x)
ax.plot(x_filt)

Nick T
  • 25,754
  • 12
  • 83
  • 121
0

The simplest pure numpy solution that does not use convolution is the one using np.cumsum. The basic idea is that the sum of elements from index i - N to index i (both inclusive) is the cumulative sum up to i, minus the cumulative sum up to i - N - 1. The normalization is just N itself:

s = np.cumsum(A)
B = (s[N:] - s[:-N]) / N

It is not clear if you want B to be the same length as A. If so, you could, for example, prepend the first N values of the cumulative sum to B using np.concatenate or np.r_:

B = np.concatenate((s[:N] / np.arange(N), (s[N:] - s[:-N]) / N))

OR

B = np.r_[s[:N] / np.arange(N), (s[N:] - s[:-N]) / N]

After writing this, I realized that @Jaime has a very similar answer to basically the same question here. I am going to retain my answer because it correctly normalizes the initial portion of the array, which I am not convinced that Jaime's answer does.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264