9

What I want to do is generate a numpy array that is the cumulative sum of another numpy array given a certain window.

For example, given an array [1,2,3,4,5,6,7,8,9,10,11,12] let's say I want a cumulative sum with a window of 3. What I want as out put would be [1,3,6,9,12,15,18,21,24,27,30,33]. I have a relatively large numpy array and would like to do a cumulative sum with a window of 400.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Rtrader
  • 917
  • 4
  • 11
  • 26

4 Answers4

34

Here is perhaps a simpler answer, based on subtracting shifted cumsums.

>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
>>> b = a.cumsum()
>>> b[3:] = b[3:] - b[:-3]
>>> b
array([ 1,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33])
andrew
  • 1,843
  • 20
  • 19
  • 6
    For others who may read these answers: This is the canonical way to do this in numpy, for whatever it's worth. If speed matters, it's _much_ faster than any of the other solutions. You're fundamentally doing many fewer calculations that explicitly re-calculating sum for each window. – Joe Kington Apr 09 '15 at 14:41
  • I looked for HOURS for a solution like this... Other answers on Stack Overflow are much higher rated, but this is MUCH faster. Indeed, the canonical numpy way (it even wraps around the original array!). Kudos to you, sir. – étale-cohomology Feb 19 '16 at 06:51
  • 1
    The third line: what if I write `b[3:] = b[3:] - b[:-3]`? Any reason for using `a.cumsum()[:-3]` instead of `b[:-3]`? – Frozen Flame Sep 22 '16 at 06:38
  • @FrozenFlame, you're right, as far as I can see, `b[3:] = b[3:] - b[:-3]` works fine. However, watch out, `b[3:] -= b[:-3]` does not work as it modifies elements before they are read. – andrew Nov 18 '16 at 21:43
8

You should likely use numpy, unless you really don't care about speed (though I would prefere it anyways). So you could use convolve or stride_tricks based approaches (these are not obvious, but solve these things nicely).

For example given a function like this (you can find more and fancier versions too):

def embed(array, dim, lag=1):
    """Create an embedding of array given a resulting dimension and lag.
    The array will be raveled before embedding.
    """
    array = np.asarray(array)
    array = array.ravel()
    new = np.lib.stride_tricks.as_strided(array,
                                     (len(array)-dim*lag+lag, dim),
                                     (array.strides[0], array.strides[0]*lag))
    return new

You can do:

embedded = embed(array, 400)
result = embedded.sum(1)

Which is memory efficient (the embedding or whatever you call it, does only create a view) and fast. The other approach of course would be to use convolve:

np.convolve(array, np.ones(400), mode='valid')

I don't know if you want the non full windows too, this would be the same as using mode='full' (default) for the convolve. For the other approach, that would have to be handled some other way.

seberg
  • 8,785
  • 2
  • 31
  • 30
3
In [42]: lis=[1,2,3,4,5,6,7,8,9,10,11,12]

In [43]: w=3       #window size

In [44]: [sum(lis[i-(w-1):i+1]) if i>(w-1) else sum(lis[:i+1])  for i in range(len(lis))]
Out[44]: [1, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33]

In [45]: w=4

In [46]: [sum(lis[i-(w-1):i+1]) if i>(w-1) else sum(lis[:i+1])  for i in range(len(lis))]
Out[46]: [1, 3, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42]

for python 2.4 or less, change the ternary operator:

(falseValue, trueValue)[condition] instead of trueValue if condition else falseValue

[(sum(lis[:i+1]),sum(lis[i-(w-1):i+1]))[i>(w-1)]  for i in range(len(lis))]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    @user1440194: you shouldn't be using Python 2.4 anymore. It's ancient. Even 2.5 has reached end of life and is no longer getting security updates. – Fred Foo Oct 03 '12 at 13:55
  • 1
    @Ashwini Chaudhary, Shouldn't you better use `itertools.accumlate()`? – LarsVegas Oct 03 '12 at 13:58
  • @larsvegas `accumulate()` is introduced in python 3.2, I am on python 2.6 and OP is using python 2.4. – Ashwini Chaudhary Oct 03 '12 at 14:03
  • @user1440194 can you post the traceback of `syntaxError` you're getting? – Ashwini Chaudhary Oct 03 '12 at 14:04
  • @Ashwini Chaudhary, Well, he is using `NumPy` so `numpy.cumsum()` is the way to go. Or just define your cumsum function. – LarsVegas Oct 03 '12 at 14:06
  • @AshwiniChaudhary `File "", line 1 windowedcumOFI = [sum(OFIs[i-(w-1):i+1]) if i>(w-1) else sum(OFIs[:i+1] for i in range(len(OFIs))] ^ SyntaxError: invalid syntax` thanks though I just did a normal loop instead of a list comprehension and it worked. The error occurs at the if – Rtrader Oct 03 '12 at 14:07
  • It is called a `list comprehension`. – LarsVegas Oct 03 '12 at 14:09
  • @user1440194 the problem is with the ternary operator, python 2.4 supports different type of ternary operator. http://www.evanfosmark.com/2008/06/ternary-operator-in-python-people-got-clever/ – Ashwini Chaudhary Oct 03 '12 at 14:13
  • You could probably do something like this with a combination of convolve and cumsum also. – reptilicus Oct 03 '12 at 14:26
  • I think `[sum(lis[max(0,i-w+1):i+1]) for i in range(len(lis))]` is a bit neater than using that ternary construct. – Junuxx Oct 03 '12 at 18:07
1

seberg's answer is better and more general than mine, but note that you need to zero-pad your samples to get the result you want.

import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
samples = 100
window = 3
padding = np.zeros(window - 1)
# zero-pad your samples
a = np.concatenate([padding,np.arange(1,samples + 1)])
newshape = (len(a) - window,window)
newstrides = a.strides * 2
# this gets you a sliding window of size 3, with a step of 1
strided = ast(a,shape = newshape,strides = newstrides)
# get your moving sum
strided.sum(1)
Community
  • 1
  • 1
John Vinyard
  • 12,997
  • 3
  • 30
  • 43