Python: fast way to compute the average of several (same length) lists?

Question

Is there a simple way to calculate the mean of several (same length) lists in Python? Say, I have [[1, 2, 3], [5, 6, 7]], and want to obtain [3,4,5]. This is to be doing 100000 times, so want it to be fast.

NumPy array are likely to be faster here than pure Python. Otherwise there really is not "fast" way of doing it, except doing it. And 100000 times isn't really *that* many. — Lennart Regebro, Dec 01 '12 at 17:15
@LennartRegebro: I've just done some benchmarks, and on such a small input `numpy.average()` is 10x slower than a simple list comprehension. Pretty surprising. — NPE, Dec 01 '12 at 17:20
@NPE I did mean NumPy (fixed). And no, that's not surprising at all. The point here is that he has an array, and he slices it vertically" so to speak. NumPy has Array objects that can do that, while in Pure Python you have lists of lists. When he says "100000" I assume he means the size of the array. — Lennart Regebro, Dec 01 '12 at 17:23

arshajii · Answer 1 · 2012-12-01T17:33:59.930

31

In case you're using numpy (which seems to be more appropriate here):

>>> import numpy as np
>>> data = np.array([[1, 2, 3], [5, 6, 7]])
>>> np.average(data, axis=0)
array([ 3.,  4.,  5.])

edited Dec 01 '12 at 17:33

answered Dec 01 '12 at 17:21

arshajii

127,459
24
238
287

https://stackoverflow.com/questions/20054243/np-mean-vs-np-average-in-python-numpy – PatrickT Jun 06 '22 at 14:43

NPE · Answer 2 · 2012-12-01T17:23:14.457

6

In [6]: l = [[1, 2, 3], [5, 6, 7]]

In [7]: [(x+y)/2 for x,y in zip(*l)]
Out[7]: [3, 4, 5]

(You'll need to decide whether you want integer or floating-point maths, and which kind of division to use.)

On my computer, the above takes 1.24us:

In [11]: %timeit [(x+y)/2 for x,y in zip(*l)]
1000000 loops, best of 3: 1.24 us per loop

Thus processing 100,000 inputs would take 0.124s.

Interestingly, NumPy arrays are slower on such small inputs:

In [27]: In [21]: a = np.array(l)

In [28]: %timeit (a[0] + a[1]) / 2
100000 loops, best of 3: 5.3 us per loop

In [29]: %timeit np.average(a, axis=0)
100000 loops, best of 3: 12.7 us per loop

If the inputs get bigger, the relative timings will no doubt change.

edited Dec 01 '12 at 17:23

answered Dec 01 '12 at 17:15

NPE

486,780
108
951
1,012

This is assuming he has two lists with a 100000 items. It is possible to interpret the question like that, but somehow I doubt that this is what he wants. – Lennart Regebro Dec 01 '12 at 17:24
1

@LennartRegebro: To me, *"This is to be doing 100000 times"* means many inputs rather than long inputs. However, we could certainly do with a clarification from the OP on this. – NPE Dec 01 '12 at 17:26

score 2 · Answer 3 · answered Dec 01 '12 at 17:29

Extending NPEs answer, for a list containing n sublists which you want to average, use this (a numpy solution might be faster, but mine uses only built-ins):

def average(l):
    llen = len(l)
    def divide(x): return x / llen
    return map(divide, map(sum, zip(*l)))

This sums up all sublists and then divides the result by the number of sublists, producing the average. You could inline the len computation and turn divide into a lambda like lambda x: x / len(l), but using an explicit function and pre-computing the length should be a bit faster.

score 0 · Answer 4 · edited Sep 03 '21 at 15:15

0

Slightly modified version for smooth work with RGB pixels:

def average(*l):
  l=tuple(l)
  def divide(x): return x // len(l)
  return list(map(divide, map(sum, zip(*l))))
print(average([0,20,200],[100,40,100]))
>>> [50,30,150]

edited Sep 03 '21 at 15:15

Adrian Mole

49,934
160
51
83

answered Sep 02 '21 at 20:16

Postan Grayhole

1

Please add further details to expand on your answer, such as working code or documentation citations. – Community Sep 02 '21 at 22:11

Python: fast way to compute the average of several (same length) lists?

4 Answers4