3

I have some code that specifies a grouping variable as a string.

group_var <- "cyl"

My current code for using this grouping variable in a dplyr pipeline is:

mtcars %>% 
     group_by_(group_var) %>% 
     summarize(mean_mpg = mean(mpg))

My best guess as to how to replace the deprecated group_by_ function with group_by is:

mtcars %>% 
     group_by(!!as.name(group_var)) %>% 
     summarize(mean_mpg = mean(mpg))

This works but is not explicitly mentioned in the programming with dplyr vignette.

Is using !!as.name() the preferred way to replace group_by_() with group_by()?

Adam Black
  • 337
  • 3
  • 13
  • 2
    Another option is `group_by_at` - `group_by_at(mtcars, group_var)`. – aosmith Nov 02 '17 at 17:51
  • 1
    This might be helpful: https://stackoverflow.com/questions/47056091/arrange-doesnt-recognize-column-name-parameter/47056273#47056273 – acylam Nov 02 '17 at 19:35
  • 1
    You can also use `library(rlang); group_by(!!parse_quosure(group_var))` – acylam Nov 02 '17 at 19:41
  • For context this is in a shiny app and the grouping variable is a user input contained in the variable `input$group_var`. – Adam Black Nov 02 '17 at 19:41
  • @useR Thanks. I think `parse_quosure()` is the function I'm after. `group_by_at` works but doesn't generalize to solving this problem with other tidyverse functions. – Adam Black Nov 02 '17 at 19:54
  • Is there any reason to use `parse_quosure()` over `as.name()`? – Adam Black Nov 02 '17 at 19:55

1 Answers1

6

Is this within a function? Otherwise I think the !!as.name() part is unnecessary and I would stick with the group_by_at(group_var) suggestion by @aosmith for simplicity sake. Otherwise, I would set it up as so:

examplr <- function(data, group_var){
  group_var <- as.name(group_var)

  data %>% 
    group_by(!!group_var) %>% 
    summarize(mean_mpg = mean(mpg))
}

examplr(data = mtcars,
        group_var = "cyl")
Dave Gruenewald
  • 5,329
  • 1
  • 23
  • 35
  • 1
    Any reason not to do this `group_by(!!as.name(group_var))`? – Adam Black Nov 02 '17 at 19:45
  • No reason immediately comes to mind, but since you asked for preferred method, I figured `group_by_at()` is simpler than `group_by(!!as.name())`. But if you plan on using `group_var` multiple times in your `dplyr` pipeline, I would recommend using the function approach I outlined above. That way, you will only need to call on your variable by `!!group_var` rather than `!!as.name(group_var)` each time. – Dave Gruenewald Nov 02 '17 at 19:51
  • 2
    You can actually use `group_by_at` in a function and pass the variable name as a string directly to it (no "unquoting" needed). – aosmith Nov 02 '17 at 19:51
  • 4
    All this new quoting and unquoting stuff is making my head spin. New to me anyway. – Adam Black Nov 02 '17 at 19:59