Fastest way to compute average of a list

Question

I want to find the fastest way to compute the average of python lists. I have millions of lists stored in a dictionary, so I am looking for the most efficient way in terms for performance.

Referring to this question, If l is a list of float numbers, I have

numpy.mean(l)
sum(l) / float(len(l))
reduce(lambda x, y: x + y, l) / len(l)

Which way would be the fastest?

`Which way would be fastest?` Why don't you test and let us know? — DeepSpace, Sep 19 '19 at 18:10

score 10 · Accepted Answer · edited Jun 12 '20 at 19:16

10

As @DeepSpace has suggested, you should try yourself to answer this question. You might also consider transforming your list into an array before using numpy.mean. Use %timeit with ipython as follows:

In [1]: import random
In [2]: import numpy
In [3]: from functools import reduce
In [4]: l = random.sample(range(0, 100), 50) # generates a random list of 50 elements

`numpy.mean` without converting to an np.array

In [5]: %timeit numpy.mean(l)
32.5 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

`numpy.mean` converting to an np.array

In [5]: a = numpy.array(a)
In [6]: %timeit numpy.mean(a)
17.6 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

`sum(l) / float(len(l))`

In [5]: %timeit sum(l) / float(len(l)) # not required casting (float) in Python 3
774 ns ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

`sum(l) / len(l)`

In [5]: %timeit sum(l) / len(l)
623 ns ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

`reduce`

In [6]: reduce(lambda x, y: x + y, l) / len(l)
5.92 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

From slowest to fastest:

numpy.mean(l) without converting to array
numpy.mean(a)after converting list to np.array
reduce(lambda x, y: x + y, l) / len(l)
sum(l) / float(len(l)), this applies for Python 2 and 3
sum(l) / len(l) # For Python 3, you don't need to cast (use float)

edited Jun 12 '20 at 19:16

Jakob Guldberg Aaes

630
6
16

answered Sep 19 '19 at 20:29

lmiguelvargasf

63,191
45
217
228

To be fair, if you are going to consider using `numpy`, you'll probably have already stored the values in an array, rather than a list. With `a = numpy.array(l)`, `numpy.mean(a)` is twice as fast as `numpy.mean(l)`. I haven't tried longer inputs to see how each scales, either. – chepner Sep 19 '19 at 21:09
@chepner, I agree with you. I will add that to my answer. – lmiguelvargasf Sep 19 '19 at 21:11
@chepner, I updated my answer, I agree with you that after converting to an array, it becomes as twice as faster than without converting. – lmiguelvargasf Sep 19 '19 at 21:21
Please try to avoid the use of el (l) as a variable name, especially here on SO. It looks more like the number 1 in the code font. Case in point, in this comment font, it looks like a capital i (I). – Starman Jun 16 '22 at 14:07
I wonder about the relative speed of `statistics.mean()`. – ChaimG Jan 13 '23 at 16:59

score 1 · Answer 2 · answered Sep 19 '19 at 18:40

Good afternoon, I just did a test with a list of 10 random floats in a list and ran a time test and found numpy to be the fastest.

#!/usr/bin/python

import numpy as np
from functools import reduce
import time

l = [0.1, 2.3, 23.345, 0.9012, .002815, 8.2, 13.9, 0.4, 3.02, 10.1]

def test1():
    return np.mean(l)

def test2():
    return sum(l) / float(len(l))

def test3():
    return reduce(lambda x, y: x + y, l) / len(l)

def timed():
    start = time.time()
    test1()
    print('{} seconds'.format(time.time() - start))
    start = time.time()
    test2()
    print('{} seconds'.format(time.time() - start))
    start = time.time()
    test3()
    print('{} seconds'.format(time.time() - start))

timed()

As always I'm sure there's a better way to do this but this does the trick. This was a small list: it would be interesting to see what you find with large lists.

Fastest way to compute average of a list

2 Answers2

numpy.mean without converting to an np.array

numpy.mean converting to an np.array

sum(l) / float(len(l))

sum(l) / len(l)

reduce

`numpy.mean` without converting to an np.array

`numpy.mean` converting to an np.array

`sum(l) / float(len(l))`

`sum(l) / len(l)`

`reduce`