16

Is there a simple way to calculate the mean of several (same length) lists in Python? Say, I have [[1, 2, 3], [5, 6, 7]], and want to obtain [3,4,5]. This is to be doing 100000 times, so want it to be fast.

Kenan Banks
  • 207,056
  • 34
  • 155
  • 173
David Tan
  • 301
  • 1
  • 2
  • 5
  • 1
    How do you get `4` for the first element? – NPE Dec 01 '12 at 17:15
  • 3
    NumPy array are likely to be faster here than pure Python. Otherwise there really is not "fast" way of doing it, except doing it. And 100000 times isn't really *that* many. – Lennart Regebro Dec 01 '12 at 17:15
  • @LennartRegebro: I've just done some benchmarks, and on such a small input `numpy.average()` is 10x slower than a simple list comprehension. Pretty surprising. – NPE Dec 01 '12 at 17:20
  • @NPE I did mean NumPy (fixed). And no, that's not surprising at all. The point here is that he has an array, and he slices it vertically" so to speak. NumPy has Array objects that can do that, while in Pure Python you have lists of lists. When he says "100000" I assume he means the size of the array. – Lennart Regebro Dec 01 '12 at 17:23

4 Answers4

31

In case you're using numpy (which seems to be more appropriate here):

>>> import numpy as np
>>> data = np.array([[1, 2, 3], [5, 6, 7]])
>>> np.average(data, axis=0)
array([ 3.,  4.,  5.])
arshajii
  • 127,459
  • 24
  • 238
  • 287
6
In [6]: l = [[1, 2, 3], [5, 6, 7]]

In [7]: [(x+y)/2 for x,y in zip(*l)]
Out[7]: [3, 4, 5]

(You'll need to decide whether you want integer or floating-point maths, and which kind of division to use.)

On my computer, the above takes 1.24us:

In [11]: %timeit [(x+y)/2 for x,y in zip(*l)]
1000000 loops, best of 3: 1.24 us per loop

Thus processing 100,000 inputs would take 0.124s.

Interestingly, NumPy arrays are slower on such small inputs:

In [27]: In [21]: a = np.array(l)

In [28]: %timeit (a[0] + a[1]) / 2
100000 loops, best of 3: 5.3 us per loop

In [29]: %timeit np.average(a, axis=0)
100000 loops, best of 3: 12.7 us per loop

If the inputs get bigger, the relative timings will no doubt change.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This is assuming he has two lists with a 100000 items. It is possible to interpret the question like that, but somehow I doubt that this is what he wants. – Lennart Regebro Dec 01 '12 at 17:24
  • 1
    @LennartRegebro: To me, *"This is to be doing 100000 times"* means many inputs rather than long inputs. However, we could certainly do with a clarification from the OP on this. – NPE Dec 01 '12 at 17:26
2

Extending NPEs answer, for a list containing n sublists which you want to average, use this (a numpy solution might be faster, but mine uses only built-ins):

def average(l):
    llen = len(l)
    def divide(x): return x / llen
    return map(divide, map(sum, zip(*l)))

This sums up all sublists and then divides the result by the number of sublists, producing the average. You could inline the len computation and turn divide into a lambda like lambda x: x / len(l), but using an explicit function and pre-computing the length should be a bit faster.

l4mpi
  • 5,103
  • 3
  • 34
  • 54
0

Slightly modified version for smooth work with RGB pixels:

def average(*l):
  l=tuple(l)
  def divide(x): return x // len(l)
  return list(map(divide, map(sum, zip(*l))))
print(average([0,20,200],[100,40,100]))
>>> [50,30,150]
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83