1

With dplyr and R, you can use group_by and summarize to aggregate data.

For instance:

mpg_cyl_carb <- mtcars %>% 
  group_by(cyl, carb) %>% 
  summarise(var1 = mean(mpg))

head(mpg_cyl_carb, 3)
A tibble: 3 x 3
Groups:   cyl [2]
    cyl  carb  var1
  <dbl> <dbl> <dbl>
1     4     1  27.6
2     4     2  25.9
3     6     1  19.8

It means that when cyl = 4 and carb = 4, the mean for mpg is 27.6. When cyl = 6 and carb = 1, the mean is 19.8, and so on.

I would like to nest those aggregate results on the original dataframe. Currently am joining two tables to do this:

> mtcars %>% 
+   left_join(mpg_cyl_carb, by = c("cyl", "carb")) %>% 
+   head(3) %>% 
+   select(mpg, cyl, carb, var1)
   mpg cyl carb  var1
1 21.0   6    4 19.75
2 21.0   6    4 19.75
3 22.8   4    1 27.58

But is there an easier way? A single command for mutate, like:

> mtcars %>% 
+   mutate(. . .)

Not a solution using if_else, as it would add complexity.

BMLopes
  • 546
  • 6
  • 10

1 Answers1

1

Use the group_by before the mutate to create the mean column by group - instead of creating a summarised dataset and then joining to original data

library(dplyr)
mtcars %>% 
   group_by(cyl, carb) %>%
   mutate(var1 = mean(mpg)) %>%
   ungroup %>%
   head
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Generally, group_by is followed by a summarise command, but this is a good example where group_by + mutate go together well. Thank you! – BMLopes Dec 29 '21 at 21:39