Running mean of numpy ndarrays from iterator

Question

The question of how to compute a running mean of a series of numbers has been asked and answered before. However, I am trying to compute the running mean of a series of ndarrays, with an unknown length of series. So, for example, I have an iterator data where I would do:

running_mean = np.zeros((1000,3))
while True:
    datum = next(data)
    running_mean = calc_running_mean(datum)

What would calc_running_mean look like? My primary concern here is memory, as I can't have the entirety of the data in memory, and I don't know how much data I will be receiving. datum would be an ndarray, let's say that for this example it's a (1000,3) array, and the running mean would be an array of the same size, with each element containing the elementwise mean of every element we've seen in that position so far.

The key distinction this question has from previous questions is that it's calculating the elementwise mean of a series of ndarrays, and the number of arrays is unknown.

Paul Panzer · Accepted Answer · 2018-06-13T20:56:23.263

1

You can use itertools together with standard operators:

>>> import itertools, operator
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))

Example:

>>> data = (np.linspace(-i, i*i, 6) for i in range(10))
>>> 
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
>>> 
>>> for i in running_mean:
...     print(i)
... 
[0. 0. 0. 0. 0. 0.]
[-0.5 -0.3 -0.1  0.1  0.3  0.5]
[-1.         -0.46666667  0.06666667  0.6         1.13333333  1.66666667]
[-1.5 -0.5  0.5  1.5  2.5  3.5]
[-2.  -0.4  1.2  2.8  4.4  6. ]
[-2.5        -0.16666667  2.16666667  4.5         6.83333333  9.16666667]
[-3.   0.2  3.4  6.6  9.8 13. ]
[-3.5  0.7  4.9  9.1 13.3 17.5]
[-4.          1.33333333  6.66666667 12.         17.33333333 22.66666667]
[-4.5  2.1  8.7 15.3 21.9 28.5]

edited Jun 13 '18 at 20:56

answered Jun 13 '18 at 20:48

Paul Panzer

51,835
3
54
99

This does solve the problem posed in the original question. However, the object I'm receiving the data from isn't actually an iterator and isn't compatible with these functions. Is there a way to do this with a generic object that emits numpy arrays when you call `next()` on it? – Hal T Jun 18 '18 at 18:26
@HalT if you have control of this object, I think the easiest would be to add an `__iter__` method that does nothing except returning `self`. If not the obvious---if ugly---workaround would be to wrap object in a class that just proxies the `__next__` method and has `__iter__` as just described. – Paul Panzer Jun 18 '18 at 19:33

Running mean of numpy ndarrays from iterator

1 Answers1