different results for standard deviation using numpy and R

Question

I get two different results when I try to compute the standard deviation with numpy and R . There is probably something of stupid that I am missing but what?

R code

x1=matrix(c(1,7,5,8,9,5,4,5,4,3,76,8),nrow=4)
std=sd(x1[,1])
mean=mean(x1[,1])
std=apply(X=x1,MARGIN=2,FUN=sd)
std



> x1=matrix(c(1,7,5,8,9,5,4,5,4,3,76,8),nrow=4)
> std=sd(x1[,1])
> std=apply(X=x1,MARGIN=2,FUN=sd)
> std
[1]  3.095696  2.217356 35.565667

Python code

import numpy as np

x1=np.matrix([[1.,9.,4.],[7.,5.,3.],[5.,4.,76.],[8.,5.,8.]])
std=np.apply_along_axis(func1d=np.std,axis=0,arr=x1)


std
Out[9]: array([  2.68095132,   1.92028644,  30.80077109])

score 13 · Answer 1 · answered Dec 20 '13 at 17:10

13

For future searches, R calulates the standard deviation with N - 1 as the denominator, and numpy with N. To get the same result try this setting ddof (the "delta degrees of freedom" )

x1.std(axis=0, ddof=1)

Note that you can save a lot of cruft by using different notation:

In [33]: x1.std(axis=0)
Out[33]: matrix([[  2.68095132,   1.92028644,  30.80077109]])

In [34]: x1.std(axis=0, ddof=1)
Out[34]: matrix([[  3.09569594,   2.21735578,  35.56566697]])

answered Dec 20 '13 at 17:10

danodonovan

19,636
10
70
78

do you know if there is any variant of sklearn.preprocessing.scale(x1) in order to scale the data using the same definition of standard deviation of R? – Donbeo Dec 20 '13 at 18:07

score 4 · Accepted Answer · edited May 23 '17 at 12:16

4

This will get you the same answer as numpy. See Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong? and http://en.wikipedia.org/wiki/Standard_deviation for reference

  apply(x1, 2, function(x) sd(x) * sqrt((length(x) - 1) / length(x)) )

edited May 23 '17 at 12:16

Community

1
1

answered Dec 20 '13 at 17:08

Jake Burkhead

6,435
2
21
32

score 4 · Answer 3 · answered Dec 20 '13 at 17:09

4

By default, R deducts one degree of freedom due to the mean computation in the standard deviation computation.

The NumPy equivalent of the R code is:

np.std(x1, axis = 0, ddof = 1)

answered Dec 20 '13 at 17:09

tchakravarty

10,736
12
72
116

different results for standard deviation using numpy and R

3 Answers3