dplyr group_by dynamic cols

Question

What is the consensus on the best way to group_by when the group_by is being fed a variable? Consider the following simple function:

library(dplyr)

myFunction <- function(df, 
                        col_name) {

    out <- 
      group_by(col_name) %>%
      summarize(mean = mean(mpg))

    return(out)
  }

  myFunction(mtcars, col_name = c('cyl', 'am'))

The call to this function returns and error stating the column doesn't exist. I understand why but am not sure the best approach to get around this. I can make if work if only have one grouping variable by doing:

group_by(!!as.name(col_name))

This however doesn't work if col_name is a vector > 1

Any ideas?

You can take a variable number of bare column names in a `...` argument, then use `group_by(!!!quos(...))`. You might also be able to use the newer `{{ }}` tidyeval notation for a list like `...`, but I'm not sure about that — camille, Feb 26 '20 at 15:19
Does this answer your question? [dplyr - groupby on multiple columns using variable names](https://stackoverflow.com/questions/34487641/dplyr-groupby-on-multiple-columns-using-variable-names) — camille, Feb 26 '20 at 15:23

score 2 · Answer 1 · answered Feb 26 '20 at 15:16

You can try:

myFunction <- function(df, col_name) {
 out <- df %>%
  group_by_at(vars(one_of(col_name))) %>%
  summarize(mean = mean(mpg))

 return(out)
}

myFunction(mtcars, col_name = c("cyl", "am"))

    cyl    am  mean
  <dbl> <dbl> <dbl>
1     4     0  22.9
2     4     1  28.1
3     6     0  19.1
4     6     1  20.6
5     8     0  15.0
6     8     1  15.4

dplyr group_by dynamic cols

1 Answers1