5

I would like to understand how to pass strings representing expressions into dplyr, so that the variables mentioned in the string are evaluated as expressions on columns in the dataframe. The main vignette on this topic covers passing in quosures, and doesn't discuss strings at all.

It's clear that quosures are safer and clearer than strings when representing expressions, so of course we should avoid strings when quosures can be used instead. However, when working with tools outside the R ecosystem, such as javascript or YAML config files, one will often have to work with strings instead of quosures.

For example, say I want a function that does a grouped tally using expressions passed in by the user/caller. As expected, the following code doesn't work, since dplyr uses nonstandard evaluation to interpret the arguments to group_by.

library(tidyverse)

group_by_and_tally <- function(data, groups) {
  data %>%
    group_by(groups) %>%
    tally()
}

my_groups <- c('2 * cyl', 'am')
mtcars %>%
  group_by_and_tally(my_groups)
#> Error in grouped_df_impl(data, unname(vars), drop): Column `groups` is unknown

In dplyr 0.5 we would use standard evaluation, such as group_by_(.dots = groups), to handle this situation. Now that the underscore verbs are deprecated, how should we do this kind of thing in dplyr 0.7?

In the special case of expressions that are just column names we can use the solutions to this question, but they don't work for more complex expressions like 2 * cyl that aren't just a column name.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
Paul
  • 3,321
  • 1
  • 33
  • 42

3 Answers3

11

It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos():

library(tidyverse)
library(rlang)

group_by_and_tally <- function(data, groups) {
  data %>%
    group_by(UQS(groups)) %>%
    tally()
}

my_groups <- quos(2 * cyl, am)
mtcars %>%
  group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups:   2 * cyl [?]
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:

my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
  group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups:   2 * cyl [?]
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.

Paul
  • 3,321
  • 1
  • 33
  • 42
  • Yeah, but I'm paranoid about the ambiguity with negation that Hadley has alluded to in some places. – Paul Jun 16 '17 at 16:28
  • hmmm, haven't run into any issues yet. I think as long as you use `!!` and `!!!` inside dplyr verbs you should be fine. – yeedle Jun 16 '17 at 16:29
  • You're probably right. Personally I have found this transition pretty confusing, and I've been finding it easier to know what I'm doing when I use `UQ` and `UQS`. – Paul Jun 16 '17 at 16:34
  • In response to a proposed edit, please note that using dots to represent the function arguments causes the string use case to break. – Paul Jun 19 '17 at 16:43
  • User input in the form of strings and character vectors is very common in Shiny apps. I didn't know about `rlang::parse_expr` and `rlang::parse_quosure`. Thanks! I applied your suggestions to input I get in a shiny app at https://groups.google.com/forum/#!topic/manipulatr/UyzWc-s_bos – Vincent Jun 25 '17 at 02:18
5

It is tempting to use strings but it is almost always better to use expressions. Now that you have quasiquotation, you can easily build up expressions in a flexible way:

lhs <- "cyl"
rhs <- "disp"
expr(!!sym(lhs) * !!sym(rhs))
#> cyl * disp

vars <- c("cyl", "disp")
expr(sum(!!!syms(vars)))
#> sum(cyl, disp)
Lionel Henry
  • 6,652
  • 27
  • 33
  • I understand the preference for expressions over strings, but it doesn't cover all use cases. What if the expressions are not coming from a programmer typing R source code? What if they're coming from a CSV, a YAML/JSON config file, or user-entered form data from a website? – Paul Jun 16 '17 at 16:49
  • I think @lionel's point is that you can turn strings into `syms` which you can then use to build a quosure. – yeedle Jun 16 '17 at 16:55
  • 1
    Again, I get that, but that doesn't justify a blanket statement that it is always better to use expressions. This solution doesn't work when the caller is a user outside the R ecosystem and that user wants to pass their own expression which is more complicated than just a column name. – Paul Jun 16 '17 at 17:09
  • 1
    yes if the code is coming from outside R, as in any source file, it's ok and necessary to parse ;) Then you can use parse_expr() or parse_quosure(). – Lionel Henry Jun 16 '17 at 18:58
  • btw the blanket statement is justified because programming with strings is a major source of bad R code, and people will use your post to do just that with tidyverse tools. – Lionel Henry Jun 17 '17 at 04:23
  • If you don't give people a good way to do something they need to do, they will figure out a bad way to do it (and blame you). I will make some edits to encourage responsible usage of this technique. – Paul Jun 18 '17 at 18:26
  • User input in the form of strings and character vectors is very common in Shiny apps. Although I'm glad that `rlang::parse_expr` exists I do hope that the _ verbs will **not** be removed from dplyr – Vincent Jun 25 '17 at 02:20
  • They will be deprecated (i.e. still work with a warning) sometime next year. You can often use `!!! syms(x)` to supply character vectors to tidyeval verbs if they are symbols, or `!!! parse_exprs(x)` if they are expressions. – Lionel Henry Jun 26 '17 at 13:03
  • @lionel Could you please elaborate? What makes programming with strings "a major source of bad R code"? – sirallen Mar 06 '18 at 21:19
  • 1
    I won't elaborate here, but it's unstructured. So you end up with ad hoc code to structure it (do I need to add a comma there, do I need to escape this character, etc) – Lionel Henry Mar 06 '18 at 21:23
2

Package friendlyeval can help you with this:

library(tidyverse)
library(friendlyeval)

group_by_and_tally <- function(data, groups) {
  data %>%
    group_by(!!!friendlyeval::treat_strings_as_exprs(groups)) %>%
    tally()
}

my_groups <- c('2 * cyl', 'am')
mtcars %>%
  group_by_and_tally(my_groups)

# # A tibble: 6 x 3
# # Groups:   2 * cyl [?]
# `2 * cyl`    am     n
# <dbl> <dbl> <int>
# 1         8     0     3
# 2         8     1     8
# 3        12     0     4
# 4        12     1     3
# 5        16     0    12
# 6        16     1     2
MilesMcBain
  • 1,115
  • 10
  • 12