46

Here is my code:

import numpy as np
print(np.std(np.array([0,1])))

it produces 0.5

I am confident that this is incorrect. What am I doing wrong?

user1700890
  • 7,144
  • 18
  • 87
  • 183
  • 1
    This is correct. `std = RMS(data - mean)`. In this case: `std = sqrt((0.5^2 + 0.5^2) / 2) = sqrt(0.25) = 0.5` – Mad Physicist Dec 02 '15 at 18:43
  • 2
    @MadPhysicist, thank you, I just got a bit confused with sample and population std. Google spreadsheet uses sample standard deviation under stdev. – user1700890 Dec 02 '15 at 18:46
  • 5
    Set the optional `ddof` parameter to `1` to get the population std: http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html – Mad Physicist Dec 02 '15 at 18:48
  • 6
    BTW thanks for that import at the top. Most people leave it out making their code harder to copy-n-paste into the console. – Mad Physicist Dec 02 '15 at 18:53

1 Answers1

97

By default, numpy.std returns the population standard deviation, in which case np.std([0,1]) is correctly reported to be 0.5. If you are looking for the sample standard deviation, you can supply an optional ddof parameter to std():

>>> np.std([0, 1], ddof=1)
0.70710678118654757

ddof modifies the divisor of the sum of the squares of the samples-minus-mean. The divisor is N - ddof, where the default ddof is 0 as you can see from your result.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • 1
    I think default of numpy is `population` standard deviation, which is N and sample standard deviation is N-1. – user1700890 Dec 02 '15 at 19:07
  • I admit that my terminology may be backwards. – Mad Physicist Dec 02 '15 at 19:08
  • Fixed. Thanks for the correction. – Mad Physicist Dec 02 '15 at 19:09
  • Hmm, I must be missing something, but on wikipedia, the version of ddof=0 is called "sample standard deviation". And the case of ddof=1 means "In that case the result of the original formula would be called the sample standard deviation. Dividing by n − 1 rather than by n gives an unbiased estimate of the variance of the larger parent population. This is known as Bessel's correction.". In your answer, I think you have it backwards. – Johannes Schaub - litb Oct 30 '17 at 20:23
  • 1
    OK, this is as confusing as it can get, since the same term ("sample standard deviation") is used for two opposite things. Contrary to the article about Standard deviation, the article about Bessel correction says *"This correction is so common that the term "sample variance" and "sample standard deviation" are frequently used to mean the corrected estimators (unbiased sample variation, less biased sample standard deviation), using n − 1."*. – Johannes Schaub - litb Oct 30 '17 at 20:55
  • 2
    Either the interpretation is "standard deviation used when you only have samples" or "the standard deviation of the samples". But what is *not* ambiguous, I think, is the term "population standard deviation". And numpy doesn't return that, but the standard deviation of the samples (i.e the uncorrected). – Johannes Schaub - litb Oct 30 '17 at 20:57
  • @JohannesSchaub-litb. Would you mind making the correction? It has been a long time since I wrote this answer, and at this point I think you are much better versed in the terminology than I am. The only thing I was really able to understand unambiguously from your explanation is why I have always been confused by the terms :) – Mad Physicist Oct 30 '17 at 21:31