R: Calculate standard deviation in cols in a data.frame despite of NA-Values

Question

Good Morning, I got a lot of data and i have to calculate with it. There are 25 columns (variables) and each column contains thousands of values. But also missing values. I calculated the mean with

colMeans(df, na.rm = TRUE)

How can i calculate the sd of each column and ignore the NA-values?

Relevant: http://stackoverflow.com/questions/20794284/means-and-sd-for-columns-in-a-dataframe-with-na-values — thelatemail, Jun 14 '16 at 09:28

Sotos · Accepted Answer · 2016-06-14T09:15:57.073

11

You can try,

apply(df, 2, sd, na.rm = TRUE)

As the output of apply is a matrix, and you will most likely have to transpose it, a more direct and safer option is to use lapply or sapply as noted by @docendodiscimus,

sapply(df, sd, na.rm = TRUE)

edited Jun 14 '16 at 09:15

answered Jun 14 '16 at 08:11

Sotos

51,121
6
32
66

1

@Ernsthaft, 2 means that we are iterating on every column of the data frame. If we put 1, then we are iterating over rows and 1:2 over each individual value – Sotos Jun 14 '16 at 08:27
1

ok that was new and interesting for me. Thank you for helping a beginner :) – Ernsthaft Jun 14 '16 at 08:34
4

Take care when usng `apply` on a `data.frame` since it converts it to matrix which in turn may result in unexpected type coercion. To iterate over data.frame columns, it's usually safer to use `lapply` or `sapply`, i.e. `sapply(df, sd, na.rm = TRUE)` – talat Jun 14 '16 at 08:56
what sort of type coercion could one realistically expect? One does not normally calculate the standard deviation of characters/factors? – jiggunjer Dec 11 '18 at 06:05

score 3 · Answer 2 · answered Jun 14 '16 at 08:35

3

If we convert to matrix, colSds from matrixStats can be used

library(matrixStats)
colSds(as.matrix(df), na.rm=TRUE)

Or we can use summarise_each from dplyr

library(dplyr)
df1 %>%
    summarise_each(funs(sd(., na.rm=TRUE)))

answered Jun 14 '16 at 08:35

akrun

874,273
37
540
662

score 0 · Answer 3 · edited Aug 13 '18 at 16:43

0

As the functioin summarise_each() has been deprecated, here is an up-to-date example using dplyr:

df1 %>% summarise_all(funs(sd(., na.rm = FALSE)))

edited Aug 13 '18 at 16:43

Ralf

16,086
4
44
68

answered Aug 13 '18 at 16:22

Dan

1
2

score 0 · Answer 4 · answered Sep 27 '18 at 11:28

0

sd(variablenname,na.rm=TRUE)

This works for me. Replace "variablename" with the variable you use.

answered Sep 27 '18 at 11:28

Cindy Wang

11

R: Calculate standard deviation in cols in a data.frame despite of NA-Values

4 Answers4