How to groupby dataframe based on a specific column and add the aggregations as a separate column in R?

Question

I've a dataframe that looks like this:

  PM2.5 PM10 SO2 datetime
1     4    4   7  2013-3-1
2     8    4   7  2013-3-1
3     7    7   3  2013-3-1
4     6    6   3  2013-3-2
5     3    3   3  2013-3-2
6     5    5   4  2013-3-2

Now i want to groupby all the columns based on datetime column and after manipulations the resultant dataframe should look like shown below:

       PM2.5       PM10       SO2    datetime   PM2.5_mean  PM10_mean  SO2_mean  PM2.5_min  PM10_min  SO2_min   PM2.5_max  PM10_max  SO2_max
1    [4,8,7]    [4,4,7]   [7,7,3]    2013-3-1         6.33          5      5.66          4         4        3           8         8        7  
2    [6,3,5]    [6,3,5]   [3,3,4]    2013-3-2         4.66       4.66      3.33          3         3        3           6         6        4

I tried to apply aggregate function, but with that i can only get either mean / min / max. But i want to mutate the mean, min, max as separate columns for each existing column in the dataframe. How can i do it? or Is there any other way by which i can get the required result?

akrun · Accepted Answer · 2020-01-02T22:31:18.743

An option is after grouping by 'datetime', get the mean/max of the rest of the columns in mutate_at, add that in group_by and then paste the initial columns

library(dplyr)
df1 %>% 
   group_by(datetime) %>%
   mutate_at(vars(-group_cols()), list(mean = mean, max = max)) %>% 
   group_by_at(vars(matches('(mean|max)$')), .add = TRUE) %>% 
   summarise_at(vars(-group_cols()), ~ sprintf("[%s]", toString(.)))
# A tibble: 2 x 10
# Groups:   datetime, PM2.5_mean, PM10_mean, SO2_mean, PM2.5_max, PM10_max [2]
#  datetime PM2.5_mean PM10_mean SO2_mean PM2.5_max PM10_max SO2_max PM2.5     PM10      SO2      
#  <chr>         <dbl>     <dbl>    <dbl>     <int>    <int>   <int> <chr>     <chr>     <chr>    
#1 2013-3-1       6.33      5        5.67         8        7       7 [4, 8, 7] [4, 4, 7] [7, 7, 3]
#2 2013-3-2       4.67      4.67     3.33         6        6       4 [6, 3, 5] [6, 3, 5] [3, 3, 4]

data

df1 <- structure(list(PM2.5 = c(4L, 8L, 7L, 6L, 3L, 5L), PM10 = c(4L, 
4L, 7L, 6L, 3L, 5L), SO2 = c(7L, 7L, 3L, 3L, 3L, 4L), datetime = c("2013-3-1", 
"2013-3-1", "2013-3-1", "2013-3-2", "2013-3-2", "2013-3-2")),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

How to groupby dataframe based on a specific column and add the aggregations as a separate column in R?

1 Answers1

data