257

I notice that

In [30]: np.mean([1, 2, 3])
Out[30]: 2.0

In [31]: np.average([1, 2, 3])
Out[31]: 2.0

However, there should be some differences, since after all they are two different functions.

What are the differences between them?

kmario23
  • 57,311
  • 13
  • 161
  • 150
Sibbs Gambling
  • 19,274
  • 42
  • 103
  • 174
  • 28
    Actually, the documentation doesn't make it immediately clear, as far as I can see. Not saying it is impossible to tell, but I think this question is valid for Stack Overflow all the same. – BlackVegetable Nov 18 '13 at 17:47
  • 1
    numpy.mean : Returns the average of the array elements. – joaquin Nov 18 '13 at 17:47
  • 2
    @joaquin: "Compute the arithmetic mean along the specified axis." vs "Compute the weighted average along the specified axis."? – Blender Nov 19 '13 at 00:01
  • @Blender right. I was just trying to make a kind of funny response to your comment because if I follow your instructions the first thing I read in the [docs for numpy.mean](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html) is *numpy.mean : Returns the average of the array elements* which is funny if you are looking for the answer to the OP question. – joaquin Nov 19 '13 at 16:05

5 Answers5

234

np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average

np.mean:

try:
    mean = a.mean
except AttributeError:
    return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)

np.average:

...
if weights is None :
    avg = a.mean(axis)
    scl = avg.dtype.type(a.size/avg.size)
else:
    #code that does weighted mean here

if returned: #returned is another optional argument
    scl = np.multiply(avg, 0) + scl
    return avg, scl
else:
    return avg
...
Hammer
  • 10,109
  • 1
  • 36
  • 52
  • 82
    Why do they offer two different functions? Seems they should just offer `np.average` since `weights` is already optional. Seems unnecessary and only serves to confuse users. – Geoff Nov 30 '15 at 22:03
  • 12
    @Geoff I would rather have them throw a NotImplementedException for "average", to educate users that the arithmetic mean is not identical to "the average". – FooBar Jun 26 '18 at 11:15
47

np.mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).

np.average can compute a weighted average if the weights parameter is supplied.

Amber
  • 507,862
  • 82
  • 626
  • 550
32

In some version of numpy there is another imporant difference that you must be aware:

average do not take in account masks, so compute the average over the whole set of data.

mean takes in account masks, so compute the mean only over unmasked values.

g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)

np.average(f)
Out: 34.0

np.mean(f)
Out: 2.0
G M
  • 20,759
  • 10
  • 81
  • 84
  • 3
    Note: `np.ma.average` works. Also, there is a [bug report](https://github.com/numpy/numpy/issues/7330). – Neil G Mar 29 '17 at 01:53
  • 2
    `np.average` and `np.mean` both takes into account masks. I've tried and got the value of "Out: `2.0`" – CEB Jun 30 '22 at 14:40
  • @CEB the new version probably fix the bug thanks for reporting – G M Jun 30 '22 at 16:42
13

In addition to the differences already noted, there's another extremely important difference that I just now discovered the hard way: unlike np.mean, np.average doesn't allow the dtype keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5 file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64':

>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')

m1 = np.average(T, axis=(0,1))                #  garbage
m2 = np.mean(T, axis=(0,1))                   #  the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64')  # correct results

Unfortunately, unless you know what to look for, you can't necessarily tell your results are wrong. I will never use np.average again for this reason but will always use np.mean(.., dtype='float64') on any large array. If I want a weighted average, I'll compute it explicitly using the product of the weight vector and the target array and then either np.sum or np.mean, as appropriate (with appropriate precision as well).

Grant Petty
  • 1,151
  • 1
  • 13
  • 27
4

In your invocation, the two functions are the same.

average can compute a weighted average though.

Doc links: mean and average

Prashant Kumar
  • 20,069
  • 14
  • 47
  • 63