Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

Question

A simple example of calculating standard dev:

d <- c(2,4,4,4,5,5,7,9)
sd(d)

yields

[1] 2.13809

but when done by hand, the answer is 2. What am I missing here?

That was exactly the issue. I suppose then, I should assume sd is calculating a sample standard deviation. Thanks for the insight, I appreciate it. I will be adding this to all my calculations: d <- c(2,4,4,4,5,5,7,9); n <- length(d); sd(d)*sqrt((n-1)/n); — Travis Rodman, Jun 23 '11 at 17:31
On that note then, what is the command in R that would produce the standard deviation of the sample, so the N-1 in the denominator would not need to be corrected? — Travis Rodman, Jun 23 '11 at 17:43

Dirk Eddelbuettel · Accepted Answer · 2011-06-23T18:11:11.020

38

Try this

R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
[1] 2
R>

and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:

The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .

edited Jun 23 '11 at 18:11

answered Jun 23 '11 at 16:56

Dirk Eddelbuettel

360,940
56
644
725

1

After reading the article further, and your comments, I can see why there wouldn't be a function that would produce the biased result. If I want it, I will have to define it, or calculate it myself. Thanks again, spot-on answer. – Travis Rodman Jun 23 '11 at 17:54
*If* you want such a function, it is easy to write as you have `sd()` and you simply need to multiply its result by `sqrt((N-1)/N)` with N being you vector length -- pretty much like my answer but computing `N` as 8. – Dirk Eddelbuettel Jun 23 '11 at 18:09
1

It might be worth noting that `sd` also gives biased estimates for standard deviation (see e.g. this [Wikipedia article](http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)). The `N-1` correction is there to ensure that `var` is unbiased. – pete Oct 24 '12 at 01:45
The idea that every calculation of a standard deviation is necessarily an estimate seems misguided. I have many applications where I have an entire population sitting in front of me, and not having this function as a standard option in base R seems strange. Yes, it is easy to write my own function, but there is no reason why "there wouldn't be a function that would produce the biased result", because the result itself is not biased. There is a particular estimator that is biased, but not all calculations using that formula are calculations of an estimate using that estimator. – randy Jan 14 '17 at 17:53

score 9 · Answer 2 · answered Jun 23 '11 at 16:57

9

Looks like R is assuming (n-1) in the denominator, not n.

answered Jun 23 '11 at 16:57

duffymo

305,152
44
369
561

1

ouch. You may want to delete that last comment. – Nick Sabbe Jun 23 '11 at 19:29
5

No it isn't. n-1 is the *sample* standard deviation. Divisor n is the *population* standard deviation. The variance would be sd^2, but again, that would be the sample variance as R uses divisor n-1 in `var()`, just as it does in `sd()`. That R uses this divisor is clearly documented on `?sd` – Gavin Simpson Jun 23 '11 at 20:02

score 6 · Answer 3 · answered Jun 12 '15 at 21:27

When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.

  pop.var <- function(x) var(x) * (length(x)-1) / length(x)

  pop.sd <- function(x) sqrt(pop.var(x))

BTW, Khan Academy has a good discussion of population and sample standard deviation here.

score -1 · Answer 4 · answered Aug 15 '17 at 10:58

-1

Note that running the command

?sd

in R Studio displays the help page for the function. In the details section it states

Like var this uses denominator n - 1.

answered Aug 15 '17 at 10:58

ThatDataGuy

1,969
2
17
43

Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

4 Answers4

Linked

Related