How to find the average of several lines with the same id in a big R dataframe?

Question

i have a big data frame (more than 100 000 entries) that look something like this :

ID     Pre   temp  day  
134    10      6    1       
134    20      7    1        
134    10      8    1
234    5       1    2 
234    10      4    2 
234    15      10   3

I want to reduce my data frame by finding the mean value of pre, temp and day for identical ID values. At the end, my data frame would look something like this

ID   Pre   temp  day
134  13.3   7     1
234  10     5     2.3

i'm not sure how to do it ?

Thank you in advance !

Just wanted to add the data.table solution: ```dt[,.SD[,.(mean(Pre), mean(temp), mean(day))], by="ID"] ``` — J_Alaniz, Jul 13 '20 at 15:04

score 0 · Answer 1 · answered Jul 13 '20 at 14:55

With the dplyr package you can group_by your ID value and then use summarise to take the mean

library(dplyr)
df %>% 
  group_by(ID) %>% 
  summarise(Pre= mean(Pre),
            temp = mean(temp),
            day = mean(day))
# A tibble: 2 x 4
     ID   Pre  temp   day
  <dbl> <dbl> <dbl> <dbl>
1   134  13.3     7  1   
2   234  10       5  2.33

score 0 · Answer 2 · answered Jul 13 '20 at 14:57

With dplyr, a solution looks like this:

textFile <- "ID     Pre   temp  day  
134    10      6    1       
134    20      7    1        
134    10      8    1
234    5       1    2 
234    10      4    2 
234    15      10   3"

data <- read.table(text = textFile,header=TRUE)

library(dplyr)

data %>% group_by(ID) %>%
     summarise(.,Pre = mean(Pre),temp = mean(temp),day=mean(day))

...and the output:

  <int> <dbl> <dbl> <dbl>
1   134  13.3     7  1   
2   234  10       5  2.33
>

score 0 · Answer 3 · answered Jul 13 '20 at 14:57

You can try next:

library(dplyr)

#Data
df <- structure(list(ID = c(134L, 134L, 134L, 234L, 234L, 234L), Pre = c(10L, 
20L, 10L, 5L, 10L, 15L), temp = c(6L, 7L, 8L, 1L, 4L, 10L), day = c(1L, 
1L, 1L, 2L, 2L, 3L)), class = "data.frame", row.names = c(NA, 
-6L))

#Code
df %>% group_by(ID) %>% summarise_all(mean,na.rm=T)

# A tibble: 2 x 4
     ID   Pre  temp   day
  <int> <dbl> <dbl> <dbl>
1   134  13.3     7  1   
2   234  10       5  2.33

There is no need of setting each individual variable.

How to find the average of several lines with the same id in a big R dataframe?

3 Answers3