Get mean values per sample, arranged by another column of ID's

Question

Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.

I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.

library(tidyverse)

set.seed(190)

dplyr::tibble(nms = c(rep("A", 5), rep("B",10), rep("C", 10)),
              draw = c(rep(1:5, times = 5)),
              id = c(rep(c("A1", "B1", "B2", "C1", "C2"), each = 5)),
              val = c(sample(1:4, replace = T, size = 25))) %>%  print(n=25)
#> # A tibble: 25 × 4
#>    nms    draw id      val
#>    <chr> <int> <chr> <int>
#>  1 A         1 A1        3
#>  2 A         2 A1        4
#>  3 A         3 A1        2
#>  4 A         4 A1        4
#>  5 A         5 A1        1
#>  6 B         1 B1        2
#>  7 B         2 B1        1
#>  8 B         3 B1        3
#>  9 B         4 B1        3
#> 10 B         5 B1        2
#> 11 B         1 B2        1
#> 12 B         2 B2        2
#> 13 B         3 B2        2
#> 14 B         4 B2        1
#> 15 B         5 B2        2
#> 16 C         1 C1        2
#> 17 C         2 C1        1
#> 18 C         3 C1        1
#> 19 C         4 C1        3
#> 20 C         5 C1        4
#> 21 C         1 C2        1
#> 22 C         2 C2        4
#> 23 C         3 C2        4
#> 24 C         4 C2        2
#> 25 C         5 C2        4

^{Created on 2023-05-31 with reprex v2.0.2}

I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id's, the mean_val will be averaged for each draw. I would like the results to look like this

# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr> <int> <dbl>
# 1 A         1   3  
# 2 A         2   4  
# 3 A         3   2  
# 4 A         4   4  
# 5 A         5   1  
# 6 B         1   1.5
# 7 B         2   1.5
# 8 B         3   2.5
# 9 B         4   2  
#10 B         5   2  
#11 C         1   1.5
#12 C         2   2.5
#13 C         3   2.5
#14 C         4   2.5
#15 C         5   4

There are many questions already answered that address this. `df %>% group_by(nms, draw) %>% summarise(mean_val = mean(val))` — Sarah, May 31 '23 at 22:27
Thanks everyone I realize I was hung up on additional formatting in the df — jnat, May 31 '23 at 22:34

score 0 · Accepted Answer · answered May 31 '23 at 22:25

I think this will give you what you want or at least get you close.

output <- df %>% 
  group_by(nms, draw) %>% 
  summarize(mean_val = mean(val))

# A tibble: 15 × 3
# Groups:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4

Get mean values per sample, arranged by another column of ID's

1 Answers1