Programming with dplyr using string as input

Question

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example

my_summarise <- function(df, var) {
  var <- enquo(var)

  df %>%
    group_by(!!var) %>%
    summarise(a = mean(a))
}

my_summarise(df, g1)

However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.

Just use `group_by_` instead of `group_by` and you don't need any of the `enquo` stuff. — Gregor Thomas, May 22 '17 at 20:45
I realize your example is taken from the doc you link (and seems to be written by an Authoritative Source), but I have to say it seems **terrible** to use `group_by` as the name of the argument that will get passed to the function of the same name. — Gregor Thomas, May 22 '17 at 20:46
It seems to be deprecated and I would like to understand how this should be done given the new model that hadley figured out — Raivo Kolde, May 22 '17 at 20:46
Read through the `lazyeval` package vignette or the [dplyr NSE vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html). But NSE is the default that requires all the quoting/formula stuff. If you want to use a string column name then that is standard evaluation, and you need to use the standard-evaluating functions that underly all the NSE counterparts, `group_by_`, `summarize_`, etc. — Gregor Thomas, May 22 '17 at 20:52
Again all the `*_` seem to be deprecated as of now and I would like to understand what is the right way to do this. — Raivo Kolde, May 22 '17 at 21:06
Hmmm, I see. That's disappointing, seems like it was just 2-3 years ago that `lazyeval` was new and was "the right way to do NSE", and I sort of knew what was going on. — Gregor Thomas, May 22 '17 at 22:16
@Gregor: When I looked at that webpage the variable name was initially `group_var` and then Hadley tried to dig himself out of the NSE hole that he created. — IRTFM, May 22 '17 at 23:50

score 18 · Accepted Answer · edited Jul 17 '21 at 10:16

dplyr >= 1.0

Use combination of double braces and the across function:

my_summarise2 <- function(df, group_var) {
  df %>% group_by(across({{ group_var }})) %>% 
    summarise(mpg = mean(mpg))
}

my_summarise2(mtcars, "cyl")

# A tibble: 3 x 2
#    cyl   mpg
#  <dbl> <dbl>
# 1     4  26.7
# 2     6  19.7
# 3     8  15.1

# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)

dplyr < 1.0

As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):

library(dplyr)
my_summarise <- function(df, var) {
  var <- rlang::sym(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

or

my_summarise <- function(df, var) {
  var <- as.name(var)
  df %>%
    group_by(!!var) %>%
    summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
#     cyl      mpg
#   <dbl>    <dbl>
# 1     4 26.66364
# 2     6 19.74286
# 3     8 15.10000

score 2 · Answer 2 · answered Oct 02 '19 at 17:30

Using the .data pronoun from rlang is another option that works directly with column names stored as strings.

The function with .data would look like

my_summarise <- function(df, var) {
     df %>%
          group_by(.data[[var]]) %>%
          summarise(mpg = mean(mpg))
}

my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
    cyl   mpg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1

Farid · Answer 3 · 2021-02-13T09:25:57.953

This is how to do it using only dplyr and the very useful as.name function from base R:

my_summarise <- function(df, var) {
  varName <- as.name(var)
  enquo_varName <- enquo(varName)

  df %>%
    group_by(!!enquo_varName) %>%
    summarise(a = mean(a))
}

my_summarise(df, "g1")

Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Programming with dplyr using string as input

3 Answers3

Linked

Related