Why is enquo + !! preferable to substitute + eval

Question

In the following example, why should we favour using f1 over f2? Is it more efficient in some sense? For someone used to base R, it seems more natural to use the "substitute + eval" option.

library(dplyr)

d = data.frame(x = 1:5,
               y = rnorm(5))

# using enquo + !!
f1 = function(mydata, myvar) {
  m = enquo(myvar)
  mydata %>%
    mutate(two_y = 2 * !!m)
}

# using substitute + eval    
f2 = function(mydata, myvar) {
  m = substitute(myvar)
  mydata %>%
    mutate(two_y = 2 * eval(m))
}

all.equal(d %>% f1(y), d %>% f2(y)) # TRUE

In other words, and beyond this particular example, my question is: can I get get away with programming using dplyr NSE functions with good ol' base R like substitute+eval, or do I really need to learn to love all those rlang functions because there is a benefit to it (speed, clarity, compositionality,...)?

I think the world would be a better place if the `dplyr::` ppl would **just allow us to pass variable names as character strings**, as in the old underscored variants like `mutate_()`. imo, an even better option would be to have an argument like `colnames_as_strings=TRUE` for `mutate()` et al... that would make it straightforward to use dplyr both interactively and in software. But until then, welcome to `enquo()`/`!!` hell... — lefft, Apr 06 '18 at 21:39
tl;dr: the `enquo()` strategy really only makes sense if you are deeply committed to being able to pass column names without quotes (unclear to me why that's important but oh well). could be that there's some fundamental reason that requires understanding dplyr's internals to grasp... — lefft, Apr 06 '18 at 21:43
@lefft I’ve been told that passing column names as characters is “dangerous and unreliable”, but I’ve never gotten a convincing explanation for why that is except in cases that seem bizarrely rare to me. I suppose if you encounter those edge cases routinely it makes more sense, it’s just weird to me bc I don’t think I ever have. — joran, Apr 06 '18 at 22:29
@joran yeah i can imagine if one is mixing standard and non-standard evaluation there could be problems -- but ya totally agreed, i remain unconvinced re. the "dangerous and unreliable" bit (in fact i'd say that passing names *without* quotes is more dangerous + unreliable, as with `base::subset()`!) — lefft, Apr 06 '18 at 22:47
@lefft No that's shit. It doesn't actually solve anything, or make anything easier. Also, look up "stringly typed". You're suggesting to subvert the type system. That's a priori a bad idea. — Konrad Rudolph, Apr 06 '18 at 22:48
@KonradRudolph i'm suggesting to allow character-based selection/subsetting in a language whose definition uses that convention... — lefft, Apr 06 '18 at 22:50
@KonradRudolph The only thing I feel knowledgeable enough to comment on at this point is that your case maybe isn’t helped by that first sentence. — joran, Apr 06 '18 at 22:58
@lefft You're suggesting to allow strings instead of variables inside expressions (or expressions inside strings? That's even worse). That's an important difference. Nobody is talking about merely selecting columns. — Konrad Rudolph, Apr 06 '18 at 22:58
okay one last thought: motivation comes from inability to pass a character vector to `group_by()`, `select()`, and `mutate_at()/summarize_at()`. When colnames aren't (or can't) be known in advance, it can be a pain to write good split-apply-combine functions in dplyr. Sometimes even feels easier to use `base::tapply()`, precisely because you can specify grouping cols as character strings that you pass as a parameter... In the specific case OP showed, it would of course be terrible if `"m"` meant `mydata$m` (or whenever a colname is used on the rhs of `=` inside a dplyr table func). — lefft, Apr 07 '18 at 00:21
(fwiw i love `dplyr::` and use it every day -- i just want it to be the best it can be!) — lefft, Apr 07 '18 at 00:23
@lefft No, that’s no problem at all. Just use `group_by(data, !! var)`. I honestly fail to see the difficulty. It’s a simple, clean, consistent, *yet powerful* abstraction. It’s thus diametrically opposite to what `tapply` etc offer. — Konrad Rudolph, Apr 07 '18 at 20:03
@joran Annoyance got the better of me. But your comment illustrates a permanent problem in this debate: people are paying exclusive attention to tone, rather than contents. Facts don’t seem to matter. I might try to use different words but it wouldn’t change anything: a comment with a technically bad (tried, tested, and found wanting) solution got lots of upvotes. My comment which, besides foul language, offered pointers and factual arguments against it, was disregarded. — Konrad Rudolph, Apr 07 '18 at 20:06
@KonradRudolph fwiw I believe you (if for no other reason than I know you know a lot more about this than me). I was merely trying to nudge the tone in a different direction. — joran, Apr 07 '18 at 20:17

Artem Sokolov · Accepted Answer · 2019-09-18T17:49:25.027

I want to give an answer that is independent of dplyr, because there is a very clear advantage to using enquo over substitute. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute() does it only once, while !!enquo() will correctly walk up the entire calling stack.

Consider a simple function that uses substitute():

f <- function( myExpr ) {
  eval( substitute(myExpr), list(a=2, b=3) )
}

f(a+b)   # 5
f(a*b)   # 6

This functionality breaks when the call is nested inside another function:

g <- function( myExpr ) {
  val <- f( substitute(myExpr) )
  ## Do some stuff
  val
}

g(a+b)
# myExpr     <-- OOPS

Now consider the same functions re-written using enquo():

library( rlang )

f2 <- function( myExpr ) {
  eval_tidy( enquo(myExpr), list(a=2, b=3) )
}

g2 <- function( myExpr ) {
  val <- f2( !!enquo(myExpr) )
  val
}

g2( a+b )    # 5
g2( b/a )    # 1.5

And that is why enquo() + !! is preferable to substitute() + eval(). dplyr simply takes full advantage of this property to build a coherent set of NSE functions.

UPDATE: rlang 0.4.0 introduced a new operator {{ (pronounced "curly curly"), which is effectively a short hand for !!enquo(). This allows us to simplify the definition of g2 to

g2 <- function( myExpr ) {
  val <- f2( {{myExpr}} )
  val
}

Great answer man, this was what I was looking for. Many thanks. — mbiron, Nov 09 '18 at 23:11

Tung · Answer 2 · 2018-04-07T00:09:44.060

enquo() and !! also allows you to program with other dplyr verbs such as group_by and select. I'm not sure if substitute and eval can do that. Take a look at this example where I modify your data frame a little bit

library(dplyr)

set.seed(1234)
d = data.frame(x = c(1, 1, 2, 2, 3),
               y = rnorm(5),
               z = runif(5))

# select, group_by & create a new output name based on input supplied
my_summarise <- function(df, group_var, select_var) {

  group_var <- enquo(group_var)
  select_var <- enquo(select_var)

  # create new name
  mean_name <- paste0("mean_", quo_name(select_var))

  df %>%
    select(!!select_var, !!group_var) %>% 
    group_by(!!group_var) %>%
    summarise(!!mean_name := mean(!!select_var))
}

my_summarise(d, x, z)

# A tibble: 3 x 2
      x mean_z
  <dbl>  <dbl>
1    1.  0.619
2    2.  0.603
3    3.  0.292

Edit: also enquos & !!! make it easier to capture list of variables

# example
grouping_vars <- quos(x, y)
d %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_z = mean(z))

# A tibble: 5 x 3
# Groups:   x [?]
      x      y mean_z
  <dbl>  <dbl>  <dbl>
1    1. -1.21   0.694
2    1.  0.277  0.545
3    2. -2.35   0.923
4    2.  1.08   0.283
5    3.  0.429  0.292


# in a function
my_summarise2 <- function(df, select_var, ...) {

  group_var <- enquos(...)
  select_var <- enquo(select_var)

  # create new name
  mean_name <- paste0("mean_", quo_name(select_var))

  df %>%
    select(!!select_var, !!!group_var) %>% 
    group_by(!!!group_var) %>%
    summarise(!!mean_name := mean(!!select_var))
}

my_summarise2(d, z, x, y)

# A tibble: 5 x 3
# Groups:   x [?]
      x      y mean_z
  <dbl>  <dbl>  <dbl>
1    1. -1.21   0.694
2    1.  0.277  0.545
3    2. -2.35   0.923
4    2.  1.08   0.283
5    3.  0.429  0.292

Credit: Programming with dplyr

Thanks! It would be nice to see if substitute+eval could work in those cases too though. In the end, my question was basically: can I get get away with programming using dplyr NSE functions with good ol' substitute+eval, or do I really need to learn to love all those `rlang` functions you mentioned because there is a benefit to it? — mbiron, Apr 07 '18 at 00:57
@mbiron: I'm curious to see a solution using `substitute+eval` too. IMO if you're using a lot of `tidyverse` packages then it's worth to learn about `tidyeval` as Hadley and other devs are pushing toward that direction. [Here](https://stackoverflow.com/a/49470372/786542) is an example parsing input strings into `dplyr`. [Another example](https://stackoverflow.com/a/49647973/786542) using `tidyeval` in `ggplot2` — Tung, Apr 07 '18 at 03:03
@mbiron Of course you can theoretically use `eval` and `substitute` here. But the solutions would be painfully complex and complicated. {rlang}’s contribution is to generalise, formalise and simplify the solution by building on existing computer science research. — Konrad Rudolph, Apr 07 '18 at 20:10

score 5 · Answer 3 · answered Apr 06 '18 at 21:45

5

Imagine there is a different x you want to multiply:

> x <- 3
> f1(d, !!x)
  x            y two_y
1 1 -2.488894875     6
2 2 -1.133517746     6
3 3 -1.024834108     6
4 4  0.730537366     6
5 5 -1.325431756     6

vs without the !!:

> f1(d, x)
  x            y two_y
1 1 -2.488894875     2
2 2 -1.133517746     4
3 3 -1.024834108     6
4 4  0.730537366     8
5 5 -1.325431756    10

!! gives you more control over scoping than substitute - with substitute you can only get the 2nd way easily.

answered Apr 06 '18 at 21:45

Neal Fultz

9,282
1
39
60

I see. It seems related to something that shows up in [this blog post](https://www.r-bloggers.com/non-standard-evaluation-and-function-composition-in-r/): `!!` deals better with composition of functions that use NSE. Still, the examples seem a bit awkward – mbiron Apr 06 '18 at 22:46

score 4 · Answer 4 · answered Oct 04 '19 at 15:50

To add some nuance, these things are not necessarily that complex in base R.

It is important to remember to use eval.parent() when relevant to evaluate substituted arguments in the right environment, if you use eval.parent() properly the expression in nested calls will find their ways. If you don't you might discover environment hell :).

The base tool box that I use is made of quote(), substitute(), bquote(), as.call(), and do.call() (the latter useful when used with substitute()

Without going into details here is how to solve in base R the cases presented by @Artem and @Tung, without any tidy evaluation, and then the last example, not using quo / enquo, but still benefiting from splicing and unquoting (!!! and !!)

We'll see that splicing and unquoting makes code nicer (but requires functions to support it!), and that in the present cases using quosures doesn't improve things dramatically (but still arguably does).

solving Artem's case with base R

f0 <- function( myExpr ) {
  eval(substitute(myExpr), list(a=2, b=3))
}

g0 <- function( myExpr ) {
  val <- eval.parent(substitute(f0(myExpr)))
  val
}

f0(a+b)
#> [1] 5
g0(a+b)
#> [1] 5

solving Tung's 1st case with base R

my_summarise0 <- function(df, group_var, select_var) {

  group_var  <- substitute(group_var)
  select_var <- substitute(select_var)

  # create new name
  mean_name <- paste0("mean_", as.character(select_var))

  eval.parent(substitute(
  df %>%
    select(select_var, group_var) %>% 
    group_by(group_var) %>%
    summarise(mean_name := mean(select_var))))
}

library(dplyr)
set.seed(1234)
d = data.frame(x = c(1, 1, 2, 2, 3),
               y = rnorm(5),
               z = runif(5))
my_summarise0(d, x, z)
#> # A tibble: 3 x 2
#>       x mean_z
#>   <dbl>  <dbl>
#> 1     1  0.619
#> 2     2  0.603
#> 3     3  0.292

solving Tung's 2nd case with base R

grouping_vars <- c(quote(x), quote(y))
eval(as.call(c(quote(group_by), quote(d), grouping_vars))) %>%
  summarise(mean_z = mean(z))
#> # A tibble: 5 x 3
#> # Groups:   x [3]
#>       x      y mean_z
#>   <dbl>  <dbl>  <dbl>
#> 1     1 -1.21   0.694
#> 2     1  0.277  0.545
#> 3     2 -2.35   0.923
#> 4     2  1.08   0.283
#> 5     3  0.429  0.292

in a function:

my_summarise02 <- function(df, select_var, ...) {

  group_var  <- eval(substitute(alist(...)))
  select_var <- substitute(select_var)

  # create new name
  mean_name <- paste0("mean_", as.character(select_var))

  df %>%
    {eval(as.call(c(quote(select),quote(.), select_var, group_var)))} %>% 
    {eval(as.call(c(quote(group_by),quote(.), group_var)))} %>%
    {eval(bquote(summarise(.,.(mean_name) := mean(.(select_var)))))}
}

my_summarise02(d, z, x, y)
#> # A tibble: 5 x 3
#> # Groups:   x [3]
#>       x      y mean_z
#>   <dbl>  <dbl>  <dbl>
#> 1     1 -1.21   0.694
#> 2     1  0.277  0.545
#> 3     2 -2.35   0.923
#> 4     2  1.08   0.283
#> 5     3  0.429  0.292

solving Tung's 2nd case with base R but using `!!` and `!!!`

grouping_vars <- c(quote(x), quote(y))

d %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_z = mean(z))
#> # A tibble: 5 x 3
#> # Groups:   x [3]
#>       x      y mean_z
#>   <dbl>  <dbl>  <dbl>
#> 1     1 -1.21   0.694
#> 2     1  0.277  0.545
#> 3     2 -2.35   0.923
#> 4     2  1.08   0.283
#> 5     3  0.429  0.292

in a function :

my_summarise03 <- function(df, select_var, ...) {

  group_var  <- eval(substitute(alist(...)))
  select_var <- substitute(select_var)

  # create new name
  mean_name <- paste0("mean_", as.character(select_var))

  df %>%
    select(!!select_var, !!!group_var) %>% 
    group_by(!!!group_var) %>%
    summarise(.,!!mean_name := mean(!!select_var))
}

my_summarise03(d, z, x, y)
#> # A tibble: 5 x 3
#> # Groups:   x [3]
#>       x      y mean_z
#>   <dbl>  <dbl>  <dbl>
#> 1     1 -1.21   0.694
#> 2     1  0.277  0.545
#> 3     2 -2.35   0.923
#> 4     2  1.08   0.283
#> 5     3  0.429  0.292

Of course we could also use the `*_at()` variants, but it's besides the point here — moodymudskipper, Oct 04 '19 at 16:19

Why is enquo + !! preferable to substitute + eval

4 Answers4

solving Artem's case with base R

solving Tung's 1st case with base R

solving Tung's 2nd case with base R

solving Tung's 2nd case with base R but using `!!` and `!!!`

Linked

Why is enquo + !! preferable to substitute + eval

4 Answers4

solving Artem's case with base R

solving Tung's 1st case with base R

solving Tung's 2nd case with base R

solving Tung's 2nd case with base R but using !! and !!!

Linked

solving Tung's 2nd case with base R but using `!!` and `!!!`