1

I haven't found a good solution for this question in a while. What I want to do is to average certain rows from a dataframe based on an ID and create a differnt dataframe. Let's say I have a dataframe that looks like this:

Data

structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5

Now I want a second dataframe with the average per variable x per ID code also keeping the name and location. Dataframe of mean values:

ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1    1  2  3  4   5
A2   S.coli   Indv2    1  2  3  4   5

I have many ID codes so subsetting and then joining the tables is almost like doing manually. I was wondering if there is a more effective way to do it. Thank you in advance!!

Tho Vu
  • 1,304
  • 2
  • 8
  • 20
Natz
  • 77
  • 6

2 Answers2

0

We can use

library(dplyr)
df %>%
   group_by(across(ID:Location)) %>%
   summarise(across(everything(), mean, na.rm = TRUE))
# A tibble: 2 x 8
# Groups:   ID, Name [2]
#  ID    Name   Location    x1    x2    x3    x4    x5
#  <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A1    S.coli Indv1        1     2     3     4     5
#2 A2    S.coli Indv2        1     2     3     4     5

data

df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Same logic as @Akrun for older dplyr version

library(dplyr)
df %>% 
  group_by(ID, Name, Location) %>% 
  summarise_at(vars(x1:x5), mean, na.rm = TRUE)
# Groups:   ID, Name [2]
#   ID    Name   Location    x1    x2    x3    x4    x5
# <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A1    S.coli Indv1        1     2     3     4     5
# 2 A2    S.coli Indv2        1     2     3     4     5
Tho Vu
  • 1,304
  • 2
  • 8
  • 20