Numpy mean AND variance from single function?

Question

Using Numpy/Python, is it possible to return the mean AND variance from a single function call?

I know that I can do them separately, but the mean is required to calculate the sample standard deviation. So if I use separate functions to get the mean and variance I am adding unnecesary overhead.

I have tried looking at the numpy docs here (http://docs.scipy.org/doc/numpy/reference/routines.statistics.html), but with no success.

Why don't you just use numpy.std? Or would you like to calculate something other than the standard deviation? — Greg, Oct 15 '13 at 21:13

askewchan · Accepted Answer · 2013-10-16T00:17:28.250

31

You can't pass a known mean to np.std or np.var, you'll have to wait for the new standard library statistics module, but in the meantime you can save a little time by using the formula:

In [329]: a = np.random.rand(1000)

In [330]: %%timeit
   .....: a.mean()
   .....: a.var()
   .....: 
10000 loops, best of 3: 80.6 µs per loop

In [331]: %%timeit
   .....: m = a.mean()
   .....: np.mean((a-m)**2)
   .....: 
10000 loops, best of 3: 60.9 µs per loop

In [332]: m = a.mean()

In [333]: a.var()
Out[333]: 0.078365856465916137

In [334]: np.mean((a-m)**2)
Out[334]: 0.078365856465916137

If you really are trying to speed things up, try np.dot to do the squaring and summing (since that's what a dot-product is):

In [335]: np.dot(a-m,a-m)/a.size
Out[335]: 0.078365856465916137

In [336]: %%timeit
   .....: m = a.mean()
   .....: c = a-m
   .....: np.dot(c,c)/a.size
   .....: 
10000 loops, best of 3: 38.2 µs per loop

edited Oct 16 '13 at 00:17

answered Oct 15 '13 at 21:17

askewchan

45,161
17
118
134

5

For future readers: the [statistics module was added in Python 3.4](https://docs.python.org/3/library/statistics.html) and the variance function can be passed an already calculated mean to save processing time. I'm not sure how the performance of it compares with numpy, though. – Tim Tisdall Mar 18 '15 at 17:10
@TimTisdall It seems that this new module is much more precise at the cost of drastic performance degradation as discussed here: https://stackoverflow.com/questions/37533666/why-is-statistics-mean-so-slow/37533841 – Bracula Feb 25 '20 at 15:02

score 3 · Answer 2 · answered Apr 04 '19 at 00:32

I don't think NumPy provides a function that returns both the mean and the variance.

However, SciPy provides the function scipy.stats.norm.fit() which returns the mean and standard deviation of a sample. The function is named after its more specific purpose of fitting a normal distribution to a sample.

Example:

>>> import scipy.stats
>>> scipy.stats.norm.fit([1,2,3])
(2.0, 0.81649658092772603)

Note that fit() does not apply Bessel's correction to the standard deviation, so if you want that correction, you have to multiply by the appropriate factor.

score 2 · Answer 3 · answered Aug 02 '17 at 01:42

You can also avoid the subtraction by making use of the relation between mean, variance and power of a signal:

In [7]: import numpy as np

In [8]: a = np.random.rand(1000)

In [9]: %%timeit
   ...: a.mean()
   ...: a.var()
   ...: 
10000 loops, best of 3: 24.7 us per loop

In [10]: %%timeit
    ...: m = a.mean()
    ...: np.mean((a-m)**2)
    ...: 
100000 loops, best of 3: 18.5 us per loop

In [11]: %%timeit
    ...: m = a.mean()
    ...: power = np.mean(a ** 2)
    ...: power - m ** 2
    ...: 
100000 loops, best of 3: 17.3 us per loop

In [12]: %%timeit
    ...: m = a.mean()
    ...: power = np.dot(a, a) / a.size
    ...: power - m ** 2
    ...: 
100000 loops, best of 3: 9.16 us per loop

Numpy mean AND variance from single function?

3 Answers3

Linked