3

What is the consensus on the best way to group_by when the group_by is being fed a variable? Consider the following simple function:

library(dplyr)

myFunction <- function(df, 
                        col_name) {

    out <- 
      group_by(col_name) %>%
      summarize(mean = mean(mpg))

    return(out)
  }

  myFunction(mtcars, col_name = c('cyl', 'am'))

The call to this function returns and error stating the column doesn't exist. I understand why but am not sure the best approach to get around this. I can make if work if only have one grouping variable by doing:

group_by(!!as.name(col_name)) 

This however doesn't work if col_name is a vector > 1

Any ideas?

camille
  • 16,432
  • 18
  • 38
  • 60
user1658170
  • 814
  • 2
  • 14
  • 24
  • 1
    You can take a variable number of bare column names in a `...` argument, then use `group_by(!!!quos(...))`. You might also be able to use the newer `{{ }}` tidyeval notation for a list like `...`, but I'm not sure about that – camille Feb 26 '20 at 15:19
  • Does this answer your question? [dplyr - groupby on multiple columns using variable names](https://stackoverflow.com/questions/34487641/dplyr-groupby-on-multiple-columns-using-variable-names) – camille Feb 26 '20 at 15:23

1 Answers1

2

You can try:

myFunction <- function(df, col_name) {
 out <- df %>%
  group_by_at(vars(one_of(col_name))) %>%
  summarize(mean = mean(mpg))

 return(out)
}

myFunction(mtcars, col_name = c("cyl", "am"))

    cyl    am  mean
  <dbl> <dbl> <dbl>
1     4     0  22.9
2     4     1  28.1
3     6     0  19.1
4     6     1  20.6
5     8     0  15.0
6     8     1  15.4
tmfmnk
  • 38,881
  • 4
  • 47
  • 67