How to create a new dataframe extracting means based on subgroups given by a specific column in R

Question

I haven't found a good solution for this question in a while. What I want to do is to average certain rows from a dataframe based on an ID and create a differnt dataframe. Let's say I have a dataframe that looks like this:

Data

structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5

Now I want a second dataframe with the average per variable x per ID code also keeping the name and location. Dataframe of mean values:

ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1    1  2  3  4   5
A2   S.coli   Indv2    1  2  3  4   5

I have many ID codes so subsetting and then joining the tables is almost like doing manually. I was wondering if there is a more effective way to do it. Thank you in advance!!

score 0 · Accepted Answer · answered Aug 14 '20 at 20:19

We can use

library(dplyr)
df %>%
   group_by(across(ID:Location)) %>%
   summarise(across(everything(), mean, na.rm = TRUE))
# A tibble: 2 x 8
# Groups:   ID, Name [2]
#  ID    Name   Location    x1    x2    x3    x4    x5
#  <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A1    S.coli Indv1        1     2     3     4     5
#2 A2    S.coli Indv2        1     2     3     4     5

data

df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

Tho Vu · Answer 2 · 2020-08-14T20:36:05.790

0

Same logic as @Akrun for older dplyr version

library(dplyr)
df %>% 
  group_by(ID, Name, Location) %>% 
  summarise_at(vars(x1:x5), mean, na.rm = TRUE)
# Groups:   ID, Name [2]
#   ID    Name   Location    x1    x2    x3    x4    x5
# <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A1    S.coli Indv1        1     2     3     4     5
# 2 A2    S.coli Indv2        1     2     3     4     5

edited Aug 14 '20 at 20:36

answered Aug 14 '20 at 20:30

Tho Vu

1,304
2
8
20

How to create a new dataframe extracting means based on subgroups given by a specific column in R

2 Answers2

data