16

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example

my_summarise <- function(df, var) {
  var <- enquo(var)

  df %>%
    group_by(!!var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)

However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.

Raivo Kolde
  • 729
  • 6
  • 14
  • Just use `group_by_` instead of `group_by` and you don't need any of the `enquo` stuff. – Gregor Thomas May 22 '17 at 20:45
  • 1
    I realize your example is taken from the doc you link (and seems to be written by an Authoritative Source), but I have to say it seems **terrible** to use `group_by` as the name of the argument that will get passed to the function of the same name. – Gregor Thomas May 22 '17 at 20:46
  • It seems to be deprecated and I would like to understand how this should be done given the new model that hadley figured out – Raivo Kolde May 22 '17 at 20:46
  • edited the `group_by` variable name out of the example – Raivo Kolde May 22 '17 at 20:49
  • 1
    Read through the `lazyeval` package vignette or the [dplyr NSE vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html). But NSE is the default that requires all the quoting/formula stuff. If you want to use a string column name then that is standard evaluation, and you need to use the standard-evaluating functions that underly all the NSE counterparts, `group_by_`, `summarize_`, etc. – Gregor Thomas May 22 '17 at 20:52
  • 3
    Again all the `*_` seem to be deprecated as of now and I would like to understand what is the right way to do this. – Raivo Kolde May 22 '17 at 21:06
  • 1
    Hmmm, I see. That's disappointing, seems like it was just 2-3 years ago that `lazyeval` was new and was "the right way to do NSE", and I sort of knew what was going on. – Gregor Thomas May 22 '17 at 22:16
  • @Gregor: When I looked at that webpage the variable name was initially `group_var` and then Hadley tried to dig himself out of the NSE hole that he created. – IRTFM May 22 '17 at 23:50

3 Answers3

18

dplyr >= 1.0

Use combination of double braces and the across function:

my_summarise2 <- function(df, group_var) {
  df %>% group_by(across({{ group_var }})) %>% 
    summarise(mpg = mean(mpg))
}

my_summarise2(mtcars, "cyl")

# A tibble: 3 x 2
#    cyl   mpg
#  <dbl> <dbl>
# 1     4  26.7
# 2     6  19.7
# 3     8  15.1

# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)

dplyr < 1.0

As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):

library(dplyr)
my_summarise <- function(df, var) {
  var <- rlang::sym(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

or

my_summarise <- function(df, var) {
  var <- as.name(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
#     cyl      mpg
#   <dbl>    <dbl>
# 1     4 26.66364
# 2     6 19.74286
# 3     8 15.10000
Josh Gilfillan
  • 4,348
  • 2
  • 24
  • 26
lukeA
  • 53,097
  • 5
  • 97
  • 100
2

Using the .data pronoun from rlang is another option that works directly with column names stored as strings.

The function with .data would look like

my_summarise <- function(df, var) {
     df %>%
          group_by(.data[[var]]) %>%
          summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
    cyl   mpg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1
aosmith
  • 34,856
  • 9
  • 84
  • 118
0

This is how to do it using only dplyr and the very useful as.name function from base R:

my_summarise <- function(df, var) {
  varName <- as.name(var)
  enquo_varName <- enquo(varName)

  df %>%
    group_by(!!enquo_varName) %>%
    summarise(a = mean(a))
}

my_summarise(df, "g1")

Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Farid
  • 447
  • 8
  • 11