4

As applied to the same R code or objects, quote and substitute typically return different objects. How can one make this difference apparent?

is.identical <- function(X){
  out <- identical(quote(X), substitute(X))
  out
}

> tmc <- function(X){
   out <- list(typ = typeof(X), mod = mode(X), cls = class(X))
   out
 }

> df1 <- data.frame(a = 1, b = 2)

Here the printed output of quote and substitute are the same.

> quote(df1)
df1
> substitute(df1)
df1

And the structure of the two are the same.

> str(quote(df1))
 symbol df1
> str(substitute(df1))
 symbol df1

And the type, mode and class are all the same.

> tmc(quote(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"

> tmc(substitute(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"

And yet, the outputs are not the same.

> is.identical(df1)
[1] FALSE

Note that this question shows some inputs that cause the two functions to display different outputs. However, the outputs are different even when they appear the same, and are the same by most of the usual tests, as shown by the output of is.identical() above. What is this invisible difference, and how can I make it appear?

note on the tags: I am guessing that the Common LISP quote and the R quote are similar

Community
  • 1
  • 1
andrewH
  • 2,281
  • 2
  • 22
  • 32
  • 2
    Note that `identical(quote(df1),substitute(df1))` returns TRUE outside of a function. Inside the function, `substitute` modifies the promise `X` as documented. That modification is internal (i.e. C code internal) in a way that may not (I'm not sure) be visible from R at all except via it's memory address, via, say `pryr::address()`. – joran Jan 18 '19 at 16:49
  • Thanks, @Joran! I can not find this difference in the help page for substitute, unless it is embedded in the examples somehow. Now that you have recalled it to me, I vaguely remember reading about it, maybe in Advanced R. Why isn't it considered to be inside a function when it is inside of identical? – andrewH Jan 18 '19 at 21:28
  • More on insideness: Adding superfluous { }'s changes the outcome. I would have expected the opposite result for both identical and { }. – andrewH Jan 18 '19 at 21:31
  • The part of the docs I was referring to was "If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol.". – joran Jan 18 '19 at 21:33
  • Remember that parens and curly braces are technically functions, so the same thing is happening there: `substitute` sees the object as a promise not the original object and modifies it. – joran Jan 18 '19 at 21:37
  • As for why it isn't "inside" in the `identical(quote(df1),substitute(df1))` case, there's no intermediate step there. For `is.identical` the object has to pass through the intermediate _promise_ of `X` before you get to `identical`, as opposed to going directly there. – joran Jan 18 '19 at 21:40
  • @Joran, sorry for being dense, but I am still confused. From the outputs of str, typeof, etc., and from just printing, it looks like both substitute and quote replace X with a language object, a name, df1. Is the idea that the substitute inside the is.identical function is taking an extra step that none of the others do? That quote is producing a name, while substitute turns into the actual data frame? – andrewH Jan 19 '19 at 01:59
  • I think part of the problem is that I have never understood exactly what conditions constitute being at the "top level." I would have said that the execution of substitute couldnt happen inside identical() and still be at the top level. Is that wrong? – andrewH Jan 19 '19 at 02:02
  • It’s fine. The piece you’re missing is that there’s a step in the middle when you use the function. R isn’t technically pass-by-value, it’s pass-by-promise. quote() and substitute() aren’t being called on df1, they are being called on X, which is a promise to deliver df1 when needed. So the mental shift you have to make is to remember that X and df1 are technically different things. – joran Jan 19 '19 at 02:09
  • One thing you can try that might help is to run `debugonce(is.identical); is.identical(df1)` and then as soon as you are inside the function, compare `str(quote(X))` with `str(substitute(X))`. One is the symbol X the other is the symbol df1. – joran Jan 19 '19 at 03:28

1 Answers1

4

The reason is that the behavior of substitute() is different based on where you call it, or more precisely, what you are calling it on.

Understanding what will happen requires a very careful parsing of the (subtle) documentation for substitute(), specifically:

Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged.

So there are essentially three options.

In this case:

> df1 <- data.frame(a = 1, b = 2)
> identical(quote(df1),substitute(df1))
[1] TRUE

df1 is an "ordinary variable", but it is called in .GlobalEnv, since env argument defaults to the current evaluation environment. Hence we're in the very last case where the symbol, df1, is left unchanged and so it identical to the result of quote(df1).

In the context of the function:

is.identical <- function(X){
    out <- identical(quote(X), substitute(X))
    out
}

The important distinction is that now we're calling these functions on X, not df1. For most R users, this is a silly, trivial distinction, but when playing with subtle tools like substitute it becomes important. X is a formal argument of a function, so that implies we're in a different case of the documented behavior.

Specifically, it says that now "the expression slot of the promise replaces the symbol". We can see what this means if we debug() the function and examine the objects in the context of the function environment:

> debugonce(is.identical)
> is.identical(X = df1)
debugging in: is.identical(X = df1)
debug at #1: {
    out <- identical(quote(X), substitute(X))
    out
}
Browse[2]> 
debug at #2: out <- identical(quote(X), substitute(X))
Browse[2]> str(quote(X))
 symbol X
Browse[2]> str(substitute(X))
 symbol df1
Browse[2]> Q

Now we can see that what happened is precisely what the documentation said would happen (Ha! So obvious! ;) )

X is a formal argument, or a promise, which according to R is not the same thing as df1. For most people writing functions, they are effectively the same, but the internal implementation disagrees. X is a promise object, and substitute replaces the symbol X with the one that it "points to", namely df1. This is what the docs mean by the "expression slot of the promise"; that's what R sees when in the X = df1 part of the function call.

To round things out, try to guess what will happen in this case:

is.identical <- function(X){
    out <- identical(quote(A), substitute(A))
    out
}

is.identical(X = df1)

(Hint: now A is not a "bound symbol in the environment".)

A final example illustrating more directly the final case in the docs with the confusing exception:

#Ordinary variable, but in .GlobalEnv
> a <- 2
> substitute(a)
a

#Ordinary variable, but NOT in .GlobalEnv
> e <- new.env()
> e$a <- 2
> substitute(a,env = e)
[1] 2
joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    Great post regarding the difference! – akrun Jan 19 '19 at 06:40
  • @joran, when the documentation says "unless env is .GlobalEnv," does this refer to the environment where the variable is defined, the execution environment of the function where the variable is used, the environment of the function that calls that function, the environment where the function that uses the variable is defined, or some other environment? – andrewH Jan 20 '19 at 03:18
  • @andrewH It is referring to the argument for `substitute()` called `env`, which defaults to the current evaluation environment, which typically will be where `substitute()` is called from. – joran Jan 20 '19 at 03:42
  • @andrewH So, for example, you can construct very simple examples with a few variables in `.GlobalEnv` and some others using the same symbol in another environment, say `e <- new.env()`, where substitute will produce different results depending on how you use that argument. – joran Jan 20 '19 at 03:45
  • Thanks, @joran! That's very helpful. – andrewH Jan 26 '19 at 22:07