Calculate standard deviation across multiple rows grouped by ID

Question

I want to calculate the standard deviation across multiple rows (not per row) and then save the results into a new data frame. Best to explain using an example.

Data:

ID <- c("a","a","a","a","b","b","b","b","c","c","c","c")
y1 <- c(8,9,3,6,6,4,5,8,7,5,8,1)
y2 <- c(3,6,6,1,7,3,8,7,5,8,1,7)
y3 <- c(9,3,1,8,4,6,3,8,4,6,5,7)
df <- data.frame(ID, y1, y2, y3)

   ID y1 y2 y3
1   a  8  3  9
2   a  9  6  3
3   a  3  6  1
4   a  6  1  8
5   b  6  7  4
6   b  4  3  6
7   b  5  8  3
8   b  8  7  8
9   c  7  5  4
10  c  5  8  6
11  c  8  1  5
12  c  1  7  7

I want to calculate the standard deviation of ID$a, ID$b and ID$c and store in a new data frame. I know I can do this:

sd_a <- sd(as.matrix(subset(df, ID == "a")), na.rm = TRUE)
sd_b <- sd(as.matrix(subset(df, ID == "b")), na.rm = TRUE)
sd_c <- sd(as.matrix(subset(df, ID == "c")), na.rm = TRUE)

ID <- c("a","b","c")
sd <- c(sd_a,sd_b,sd_c)
df2 <- data.frame(ID, sd)

  ID       sd
1  a 2.958040
2  b 1.912875
3  c 2.386833

But is there a more straightforward way of achieving this?

score 2 · Answer 1 · answered Jul 18 '20 at 15:27

2

One dplyr solution could be:

df %>%
 group_by(ID) %>%
 summarise(sd = sd(unlist(cur_data())))

  ID       sd
  <fct> <dbl>
1 a      2.96
2 b      1.91
3 c      2.39

answered Jul 18 '20 at 15:27

tmfmnk

38,881
4
47
67

Darren Tsai · Accepted Answer · 2020-07-19T09:24:15.363

2

You can use pivot_longer() to stack y1 to y3 and then calculate the sd.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(y1:y3) %>%
  group_by(ID) %>%
  summarise(sd = sd(value))

# # A tibble: 3 x 2
#   ID       sd
#   <chr> <dbl>
# 1 a      2.96
# 2 b      1.91
# 3 c      2.39

edited Jul 19 '20 at 09:24

answered Jul 18 '20 at 15:31

Darren Tsai

32,117
5
21
51

score 1 · Answer 3 · answered Jul 18 '20 at 15:40

1

In base R you can do:

aggregate(values ~ ID, cbind(df[1], stack(df[-1])), sd)

  ID   values
1  a 2.958040
2  b 1.912875
3  c 2.386833

answered Jul 18 '20 at 15:40

Ritchie Sacramento

29,890
4
48
56

Calculate standard deviation across multiple rows grouped by ID

3 Answers3