2

I want to calculate the standard deviation across multiple rows (not per row) and then save the results into a new data frame. Best to explain using an example.

Data:

ID <- c("a","a","a","a","b","b","b","b","c","c","c","c")
y1 <- c(8,9,3,6,6,4,5,8,7,5,8,1)
y2 <- c(3,6,6,1,7,3,8,7,5,8,1,7)
y3 <- c(9,3,1,8,4,6,3,8,4,6,5,7)
df <- data.frame(ID, y1, y2, y3)

   ID y1 y2 y3
1   a  8  3  9
2   a  9  6  3
3   a  3  6  1
4   a  6  1  8
5   b  6  7  4
6   b  4  3  6
7   b  5  8  3
8   b  8  7  8
9   c  7  5  4
10  c  5  8  6
11  c  8  1  5
12  c  1  7  7

I want to calculate the standard deviation of ID$a, ID$b and ID$c and store in a new data frame. I know I can do this:

sd_a <- sd(as.matrix(subset(df, ID == "a")), na.rm = TRUE)
sd_b <- sd(as.matrix(subset(df, ID == "b")), na.rm = TRUE)
sd_c <- sd(as.matrix(subset(df, ID == "c")), na.rm = TRUE)

ID <- c("a","b","c")
sd <- c(sd_a,sd_b,sd_c)
df2 <- data.frame(ID, sd)

  ID       sd
1  a 2.958040
2  b 1.912875
3  c 2.386833

But is there a more straightforward way of achieving this?

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
kjtheron
  • 191
  • 1
  • 11

3 Answers3

2

One dplyr solution could be:

df %>%
 group_by(ID) %>%
 summarise(sd = sd(unlist(cur_data())))

  ID       sd
  <fct> <dbl>
1 a      2.96
2 b      1.91
3 c      2.39
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

You can use pivot_longer() to stack y1 to y3 and then calculate the sd.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(y1:y3) %>%
  group_by(ID) %>%
  summarise(sd = sd(value))

# # A tibble: 3 x 2
#   ID       sd
#   <chr> <dbl>
# 1 a      2.96
# 2 b      1.91
# 3 c      2.39
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

In base R you can do:

aggregate(values ~ ID, cbind(df[1], stack(df[-1])), sd)

  ID   values
1  a 2.958040
2  b 1.912875
3  c 2.386833
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56