32

Coming from a C / Python / Java background, I have trouble understanding some R syntax, where literals look like variables, but seem to behave like strings. For example:

library(ggplot2)
library("ggplot2")

The two lines behave equivalently. However, I would expect the first line to mean "load the library whose name is stored in the ggplot2 variable" and give an error like object 'ggplot2' not found.

Speaking of ggplot2:

ggplot(data, aes(factor(arrivalRate), responseTime, fill=factor(mode))) +
  geom_violin(trim=FALSE, position=dodge)

The variables arrivalRate, responseTime and mode do not exist, but somehow R knows to look them up inside the data data frame. I assume that aes actually receives strings, that are then processed using something like eval.

How does R parse code that it ends up interpreting some literals as strings?

user1202136
  • 11,171
  • 4
  • 41
  • 62
  • 1
    Well, just type `library` in the console and you will find in the code a line package <- as.character(substitute(package)) that is doing just that: ensuring that both cases give the same result. (that was for the first part of the question..) – agenis Nov 24 '17 at 22:46
  • 2
    Welcome to the crazy world of R. R's scoping rules have a way of figuring out what you mean in a way that most languages can't. It's great right up until it guesses wrong and finding the source of the problem becomes really hard. – Andrew Brēza Nov 25 '17 at 00:07
  • 1
    Another very basic but often overlooked example is how `plot(runif(10),rnorm(10))` creates the axis labels for the resulting plot using `substitute`. – joran Nov 25 '17 at 01:04
  • 1
    By the way, R is not the only language behaving in unexpected ways in this respect. In PHP for instance, `echo something` outputs the text "something" if there is no constant defined by that name. – Mr Lister Nov 25 '17 at 17:45

4 Answers4

28

promises

When an argument is passed to a function it is not passed as a value but is passed as a promise which consists of

  • the expression or code that the caller uses as the actual argument
  • the environment in which that expression is to be evaluated, viz. the caller's environment.
  • the value that the expression represents when the expression is evaluated in the promise's environment -- this slot is not filled in until the promise is actually evaluated. It will never be filled in if the function never accesses it.

The pryr package can show the info in a promise:

library(pryr)

g <- function(x) promise_info(x)
g(ggplot2)

giving:

$code
ggplot2  <-- the promise x represents the expression ggplot2

$env
<environment: R_GlobalEnv>  <-- if evaluated it will be done in this environment

$evaled
[1] FALSE  <-- it has not been evaluated

$value
NULL  <-- not filled in because promise has not been evaluated

The only one of the above slots in the pryr output that can be accessed at the R level without writing a C function to do it (or using a package such as pryr that accesses such C code) is the code slot. That can be done using the R function substitute(x) (or other means). In terms of the pryr output substitute applied to a promise returns the code slot without evaluating the promise. That is, the value slot is not modified. Had we accessed x in an ordinary way, i.e. not via substitute, then the code would have been evaluated in the promise's environment, stored in the value slot and then passed to the expression in the function that accesses it.

Thus either of the following result in a character string representing what was passed as an expression, i.e. the character representation of the code slot, as opposed to its value.

f <- function(x) as.character(substitute(x))
f("ggplot2")
## [1] "ggplot2"
f(ggplot2)
## [1] "ggplot2"

library

In fact, library uses this idiom, i.e. as.character(substitute(x)), to handle its first argument.

aes

The aes function uses match.call to get the entire call as an expression and so in a sense is an alternative to substitute. For example:

h <- function(x) match.call()
h(pi + 3)
## h(x = pi + 3)

Note

One cannot tell without looking at the documentation or code of a function how it will treat its arguments.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
20

An interesting quirk of the R language is the way it evaluates expressions. In most cases, R behaves the way you'd expect. Expressions in quotes are treated as strings, anything else is treated as a variable, function, or other token. But some functions allow for "non-standard evaluation", in which an unquoted expression is evaluated, more or less, as if it were a quoted variable. The most common example of this is R's way of loading libraries (which allows for unquoted or quoted library names) and its succinct formula interface. Other packages can take advantage of NSE. Hadley Wickham makes extensive use of it throughout his extremely popular tidyverse packages. Aside from saving the user a few characters of typing, NSE has a number of useful properties for dynamic programming.

As noted in the other answer, Wickham has an excellent tutorial on how it all works. RPubs user lionel also has a great working paper on the topic.

jdobres
  • 11,339
  • 1
  • 17
  • 37
7

The concept is called "non-standard evaluation", and there are many different ways in which it can be used in different R functions. See this book chapter for an introduction.

This language feature can be confusing, and arguably is not needed for the library() function, but it allows incredibly powerful code when you need to specify computations on data frames, as is the case in ggplot2 or in dplyr, for example.

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
5

The lines

library(ggplot2)
library("ggplot2")

are not equivalent. In the first line, ggplot2 is a symbol, which may or may not be bound to some value. In the second line, "ggplot2" is a character vector of length one.

A function, however, can manipulate the arguments that it gets without evaluating them, and can decide to treat both cases equivalently, which is what library does apparently.

Here's an example of how to manipulate an unevaluated expression:

> f <- function(x) match.call()  # return unevaluated function call
> x <- f(foo)
> x
f(x = foo)
> mode(x)
[1] "call"
> x[[1]]
f
> x[[2]]
foo
> mode(x[[2]])
[1] "name"
> as.character(x[[2]])
[1] "foo"
> x <- f("foo")
> mode(x[[2]])
[1] "character"
Ernest A
  • 7,526
  • 8
  • 34
  • 40
  • 1
    To build on this example, here's a demonstration of how `match.call()` can be used to hand missing parameters from one function call to another: https://stackoverflow.com/a/46289614/4975218 – Claus Wilke Nov 24 '17 at 23:05