1

I am trying to calculate a standard deviation per row

The example of the input (values_for_all):

1  0.35 0.35 4.33 0.09 4.17 0.16 9.90 15.25 0.16 2.38 2.55 8.14 0.16 NA 0.16
2  8.75 3.22 7.34 0.56 2.43 0.23 1.20 8.45 1.26 NA NA 1.24 0.16 2.34 0.36

Code:

values_for_all[values_for_all == ''] <- NA
values_for_all[] <- lapply(values_for_all, as.numeric)
values_mean <- rowMeans(values_for_all, na.rm=TRUE) 

#calculating standard deviation per row 
SD <- rowSds(values_for_all, na.rm=TRUE)
SD

The first part (values_mean) works perfectly. Unfortunately, the part with SD doesn't.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
student24
  • 252
  • 1
  • 9

3 Answers3

1

apply lets you apply a function to all rows of your data:

apply(values_for_all, 1, sd, na.rm = TRUE)

To compute the standard deviation for each column instead, replace the 1 by 2.

MånsT
  • 904
  • 5
  • 19
  • This is around 20% slower though than `matrixStats::rowSds`, ref: https://stackoverflow.com/a/17551600/6574038 – jay.sf Oct 09 '20 at 09:12
  • 1
    @jay.sf That may well be true. `sd` and `mean` aren't very fast as they perform a lot of checks, traverse the data twice to lower the risk of errors due to floating point arithmetics, etc. If speed is a big concern other solutions can definitely be preferable. – MånsT Oct 09 '20 at 10:06
1

You're using matrixStats which implies that the function should be applied to matrices. Hence wrap as.matrix around your data frame.

matrixStats::rowSds(as.matrix(dat), na.rm=TRUE)
# [1] 4.515403 3.050903
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

You can use apply:

apply(df, 1, sd)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34