1

I do not understand the cost-benefit of NSE (non-standard evaluation) in R for programming. I can see why NSE may be useful for interactive R, but for programming -- i.e. writing reusable scripts and functions -- my experience is that it mainly adds ambiguity, confusion, and hours of debugging, just to save a few user keystrokes.

In almost every case I've seen, including those throughout Advanced R, NSE seems to be avoidable by adding a few more:

  • df$x or df[[x]]
  • "quotes"
  • explicit and/or inline functions
  • do.call

with obvious benefits re. unambiguity

Can someone provide some convincing examples of when / why NSE is useful for programming in R?

Related post about the undocumented dangers of NSE.

jessexknight
  • 756
  • 7
  • 20

2 Answers2

0

I don't know wether there is an absolutely convincing example of unavoidable NSE code, in the terms you're describing (ultimately it is personal opinion) but I'm going to cite the vignette "programming on data.table" of data.table one of the most used R packages:

... from its very first releases, enabled the usage of subset and with (or within) functions ... that are useful for reducing repetition in code, enhancing readability, and reducing number the total characters the user has to type...

The problem with this kind of interface is that we cannot easily parameterize the code that uses it. This is because the expressions passed to those functions are substituted before being evaluated. The easiest workaround is to avoid lazy evaluation in the first place, and fall back to less intuitive, more error-prone approaches like df[["variable"]], etc.

So some of the main benefits of NSE is improved readability and maintainability and not only keystroke saving. Also, while you can go sometimes into not-easy-to-debug code, it is also true that the no NSE approach is also prone to errors.

Here the vignette online: https://rdatatable.gitlab.io/data.table/articles/datatable-programming.html

Here the source permalink: https://github.com/Rdatatable/data.table/blob/88039186915028ab3c93ccfd8e22c0d1c3534b1a/vignettes/datatable-programming.Rmd

Ric
  • 5,362
  • 1
  • 10
  • 23
  • Thanks -- I suppose readability is in the eye of the beholder ... I understand some benefits of reading fewer characters, but the [first answer](https://stackoverflow.com/a/66990726/5228288) in the linked question concludes the exact opposite: _"it produces code that can be hard to read, understand and maintain."_ – jessexknight Jul 26 '23 at 12:24
0

Since I wrote the answer quoted in your posts, it's probably only fair for me to highlight some advantages of NSE. I think that NSE gets mentioned most often in the context of dplyr from tidyverse, and in that context I would agree that NSE does not offer too much advantage over specifying names as strings (as is done in Python's pandas). But to be fair, the tidyverse developers have done an excellent job enabling both styles of programming by introducing the .data and .env pronouns.

Where NSE really shines is when you need to capture or manipulate unevaluated expressions. Here is a couple of examples.

1. Computing abstract syntax trees

Abstract Syntax Trees (ASTs) are essential for any tool that wants to parse and/or manipulate code (something that has become more relevant in the age of Large Language Models). NSE makes the task trivial:

getAST <- function(e) {

  # Recursive expansion of callable objects
  f <- function(.e) purrr::map_if(as.list(.e), is.call, f)

  # Capture the input expression and apply the recursive traversal
  f(substitute(e))
}

ast <- getAST(log10(a+5)*b)
str(ast)
# List of 3
#  $ : symbol *
#  $ :List of 2
#   ..$ : symbol log10
#   ..$ :List of 3
#   .. ..$ : symbol +
#   .. ..$ : symbol a
#   .. ..$ : num 5
#  $ : symbol b

2. Capturing expressions

The idea of capturing and storing expressions is actually quite widespread in R. Most built-in modeling functions will do this:

# Generalized linear model
model <- glm(mpg ~ wt, data=mtcars)
model$call
# glm(formula = mpg ~ wt, data = mtcars)

# ANOVA
model <- aov(Sepal.Length ~ Species, data=iris)
model$call
# aov(formula = Sepal.Length ~ Species, data = iris)

This can be useful for a number of reasons, including

  • Displaying exactly how the function was called for information purposes. This includes plotting. (Try doing plot(x=sin(1:10)) and looking at the y-axis label.)
  • Delaying evaluation. Maybe evaluating the expression is expensive and you want to make sure that other conditions are satisfied before doing it. In this case, it might make sense to capture and store the expression for (potentially much) later evaluation.
  • Evaluating the same expression in two different contexts, without requiring the user to create a function
f <- function(expr) {
  c(eval(substitute(expr), list(a=5, b=10)),
    eval(substitute(expr), list(a=1, b=2)))
}

f(a+b)   # [1] 15  3
f(a*b)   # [1] 50  2

Of course, all of the above can be done with standard evaluation, but I argue that in some cases it produces more complex code that would be harder to read and maintain.

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74