19

Here is an example from Hadley's advanced R book:

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))

subset2 <- function(x, condition) {
  condition_call <- substitute(condition)
  r <- eval(condition_call, x, parent.frame())
  x[r, ]
}

scramble <- function(x) x[sample(nrow(x)), ]

subscramble <- function(x, condition) {
  scramble(subset2(x, condition))
}

subscramble(sample_df, a >= 4)
# Error in eval(expr, envir, enclos) : object 'a' not found

Hadley explains:

Can you see what the problem is? condition_call contains the expression condition. So when we evaluate condition_call it also evaluates condition, which has the value a >= 4. However, this can’t be computed because there’s no object called a in the parent environment.

I understand that there is no a in the parent env, but, eval(condition_call, x, parent.frame()) evals conditional_call in x (a data.frame used as an environment), enclosed by parent.frame(). As long as there is a column named a in x, why should there be any problem?

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
qed
  • 22,298
  • 21
  • 125
  • 196

3 Answers3

16

tl;dr

When subset2() is called from within subscramble(), condition_call's value is the symbol condition (rather than the call a >= 4 that results when it is called directly). subset()'s call to eval() searches for condition first in envir=x (the data.frame sample_df). Not finding it there, it next searches in enclos=parent.frame() where it does find an object named condition.

That object is a promise object, whose expression slot is a >= 4 and whose evaluation environment is .GlobalEnv. Unless an object named a is found in .GlobalEnv or further up the search path, evaluation of the promise then fails with the observed message that: Error in eval(expr, envir, enclos) : object 'a' not found.


Detailed explanation

A nice way to discover what's going wrong here is to insert a browser() call right before the line at which subset2() fails. That way, we can call it both directly and indirectly (from within another function), and examine why it succeeds in the first case and fails in the second.

subset2 <- function(x, condition) {
  condition_call <- substitute(condition)
  browser()
  r <- eval(condition_call, x, parent.frame())  ## <- Point of failure
  x[r, ]
}

Calling subset2() directly

When a user calls subset2() directly, condition_call <- substitute(condition) assigns to condition_call a "call" object containing the unevaluated call a >= 4. This call is passed in to eval(expr, envir, enclos), which needs as its first argument a symbol that evaluates to an object of class call, name, or expression. So far so good.

subset2(sample_df, a >= 4)
## Called from: subset2(sample_df, a >= 4)
Browse[1]> is(condition_call)
## [1] "call"     "language"
Browse[1]> condition_call
## a >= 4

eval() now sets to work, searching for the values of any symbols contained in expr=condition_call first in envir=x and then (if needed) in enclos=parent.frame() and its enclosing environments. In this case, it finds the symbol a in envir=x (and the symbol >= in package:base) and successfully completes the evaluation.

Browse[1]> ls(x)
## [1] "a" "b" "c"
Browse[1]> get("a", x)
## [1] 1 2 3 4 5
Browse[1]> eval(condition_call, x, parent.frame())
## [1] FALSE FALSE FALSE  TRUE  TRUE

Calling subset2() from within subscramble()

Within the body of subscramble(), subset2() is called like this: subset2(x, condition). Fleshed out, that call is really equivalent to subset2(x=x, condition=condition). Because its supplied argument (i.e. the value passed to the formal argument named condition) is the expression condition, condition_call <- substitute(condition) assigns to condition_call the symbol object condition. (Understanding that point is pretty key to understanding exactly how the nested call fails.)

Since eval() is happy to have a symbol (aka "name") as its first argument, once again so far so good.

subscramble(sample_df, a >= 4)
## Called from: subset2(x, condition)
Browse[1]> is(condition_call)
## [1] "name"      "language"  "refObject"
Browse[1]> condition_call
## condition

Now eval() goes to work searching for the unresolved symbol condition. No column in envir=x (the data.frame sample_df) matches, so it moves on to enclos=parent.frame() For fairly complicated reasons, that environment turns out to be the evaluation frame of the call to subscramble(). There, it does find an object named condition.

Browse[1]> ls(x)
## [1] "a" "b" "c"
Browse[1]> ls(parent.frame()) ## Aha! Here's an object named "condition"
## [1] "condition" "x"

As an important aside, it turns out there are several objects named condition on the call stack above the environment from which browser() was called.

Browse[1]> sys.calls()
# [[1]]
# subscramble(sample_df, a >= 4)
# 
# [[2]]
# scramble(subset2(x, condition))
# 
# [[3]]
# subset2(x, condition)               
# 
Browse[1]> sys.frames()
# [[1]]
# <environment: 0x0000000007166f28>   ## <- Envt in which `condition` is evaluated
# 
# [[2]]
# <environment: 0x0000000007167078>
# 
# [[3]]
# <environment: 0x0000000007166348>   ## <- Current environment


## Orient ourselves a bit more
Browse[1]> environment()            
# <environment: 0x0000000007166348>
Browse[1]> parent.frame()           
# <environment: 0x0000000007166f28>

## Both environments contain objects named 'condition'
Browse[1]> ls(environment())
# [1] "condition"      "condition_call" "x"             
Browse[1]> ls(parent.frame())
# [1] "condition" "x"  

To inspect the condition object found by eval() (the one in parent.frame(), which turns out to be the evaluation frame of subscramble()) takes some special care. I used recover() and pryr::promise_info() as shown below.

That inspection reveals that condition is a promise whose expression slot is a >= 4 and whose environment is .GlobalEnv. Our search for a has by this point moved well past sample_df (where a value of a was to be found), so evaluation of the expression slot fails (unless an object named a is found in .GlobalEnv or somewhere else farther up the search path).

Browse[1]> library(pryr) ## For is_promise() and promise_info()  
Browse[1]> recover()
# 
# Enter a frame number, or 0 to exit   
# 
# 1: subscramble(sample_df, a >= 4)
# 2: #2: scramble(subset2(x, condition))
# 3: #1: subset2(x, condition)
# 
Selection: 1
# Called from: top level 
Browse[3]> is_promise(condition)
# [1] TRUE
Browse[3]> promise_info(condition)
# $code
# a >= 4
# 
# $env
# <environment: R_GlobalEnv>
# 
# $evaled
# [1] FALSE
# 
# $value
# NULL
# 
Browse[3]> get("a", .GlobalEnv)
# Error in get("a", .GlobalEnv) : object 'a' not found

For one more piece of evidence that the promise object condition is being found in enclos=parent.frame(), one can point enclos somewhere else farther up the search path, so that parent.frame() is skipped during condition_call's evaluation. When one does that, subscramble() again fails, but this time with a message that condition itself was not found.

## Compare
Browse[1]> eval(condition_call, x, parent.frame())
# Error in eval(expr, envir, enclos) (from #4) : object 'a' not found

Browse[1]> eval(condition_call, x, .GlobalEnv)
# Error in eval(expr, envir, enclos) (from #4) : object 'condition' not found
Community
  • 1
  • 1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Nice to get the clarification on the eval behavior that triggers this, and the use of browser(). – Sean Murphy Jun 01 '15 at 22:00
  • 1
    @JoshO'Brien Excellent piece of research in both answers (and thank you) but exploring this myself, I fell onto a wall trying to apply the above explanation. Being in the browser (same place as yours) the following command: `eval(condition_call, x, parent.env(environment()))` returns: `Error in eval(expr, envir, enclos) : object 'condition' not found`. If the explanation that the promise is evaluated first in the `subscramble(sample_df, a >= 4) ` call where `condition` exists then I shouldn't be receiving this error, right? I should be receiving `object 'a' not found`. – LyzandeR Jun 02 '15 at 11:31
  • @LyzandeR You're right. That's fascinating. So now, what it (strongly) looks like to me is that `condition_call`'s value really is treated as a symbol named `condition`. It is first searched for in `envir=sample_data` where, of course, it's not found. Next it's searched for in `enclos=parent.frame()` where a promise object of that name **does** exist, with value `a>=4`. But since the symbol `a` doesn't exist in `enclos`, nor in its parent environment (`.GlobalEnv`) or anywhere else further up the search path, it throws the error we see. – Josh O'Brien Jun 02 '15 at 14:59
  • @LyzandeR In your nifty example, with `enclos=parent.env(environment())`, the symbol table in `parent.frame()` (where the promise named `condition` is stored) is bypassed during the lookup process, so we get an error that `object 'condition' not found`. Super interesting, and it actually makes more sense than my explanation, which had to ignore the fact that `is(condition_call)` was flat out telling me that its values was a `name`. – Josh O'Brien Jun 02 '15 at 15:05
  • I'm pretty sure those last two comments give the real answer. No time to write it up right now, though. Do you think I should eventually write it up as a separate answer, or rewrite this one? – Josh O'Brien Jun 02 '15 at 15:07
  • @JoshO'Brien Thanks Josh! This is amazing! I felt inside of me that the initial explanation was missing something but now I am very confident that this is the right answer. Finally!! I have been thinking about this question for the last 2 days. This sounds like the right explanation and all of the pieces seem to be connecting (without violating any internal R rules that I know of). – LyzandeR Jun 02 '15 at 15:20
  • @JoshO'Brien I think you should rewrite this one, but please try to be as explanatory as in this answer (because the process of finding the answer above is perfect - shows a methodology of how to solve a problem which is equally as important). Also, please try to keep the process of finding the answer (i.e. keep the browser parts as well as the last bit of code using `is_promise` and `sys.frames` and `sys.calls`) as I found it extremely helpful. Thanks again for this. Really good research, good R knowledge and a very good explanation all in all. Thanks. – LyzandeR Jun 02 '15 at 15:25
  • @LyzandeR Yes, those are *exactly* the feelings I had and now have. The explanation I gave yesterday made me think, is that *really* how they've set things up? Can that be? How annoying! I'll go ahead and rewrite this, but possibly not until tomorrow. And I will certainly leave the intermediate steps in there. I think I included that in the first place because I wanted to show my not-fully baked thought process, and I'm glad I did; tracking this one down with you and Sean Murphy has been a very rewarding pursuit. – Josh O'Brien Jun 02 '15 at 15:28
  • Indeed it has for me as well, I learnt a lot :) and I am 100% certain that many users will find it extremely helpful too. Well done! – LyzandeR Jun 02 '15 at 15:42
  • @LyzandeR OK, I just posted a completely rewritten (and now correct!) answer. The explanation's not real polished, but I think it basically captures what's going on. – Josh O'Brien Jun 03 '15 at 16:19
6

This was a tricky one, so thanks for the question. The error has to do with how substitute acts when it's called on an argument. If we look at the help text from substitute():

Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol.

What this means is that when you evaluate condition within the nested subset2 function, substitute sets condition_call to be the promise object of the unevaluated 'condition' argument. Since promise objects are pretty obscure, the definition is here: http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Promise-objects

The key points from there are:

Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment.

and

When the argument is accessed, the stored expression is evaluated in the stored environment, and the result is returned

Basically, within the nested function, condition_call is set to the promise object condition, rather than the substitution of the actual expression contained within condition. Because promise objects 'remember' the environment they come from, it seems this overrides the behavior of eval() - so regardless of the second argument to eval(), condition_call is evaluated within the parent environment that the argument was passed from, in which there is no 'a'.

You can create promise objects with delayedAssign() and observe this directly:

delayedAssign("condition", a >= 4)
substitute(condition)
eval(substitute(condition), sample_df)

You can see that substitute(condition) does not return a >= 4, but simply condition, and that trying to evaluate it within the environment of sample_df fails as it does in Hadley's example.

Hopefully this is helpful, and I'm sure someone else can clarify further.

Sean Murphy
  • 1,217
  • 8
  • 15
1

In case anyone else stumbles upon this thread, here is the answer to task #5 below this section in Hadley's book. It also contains a possible general solution to the problem discussed above.

subset2 <- function(x, condition, env = parent.frame()) {
  condition_call <- substitute(condition, env)
  r <- eval(condition_call, x, env)
  x[r, ]
}
scramble <- function(x) x[sample(nrow(x)), ]
subscramble <- function(x, condition) {
  scramble(subset2(x, condition))
}
subscramble(sample_df, a >= 3)

The magic happens in the second line of subset2. There, substitute receives an explicite env argument. From the help section for substitute: "substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env." env "Defaults to the current evaluation environment". Instead, we use the calling environment.

Check it out like this:

debugonce(subset2)
subscramble(sample_df, a >= 3)
Browse[2]> substitute(condition)
condition
Browse[2]> substitute(condition, env)
a >= 3

I am not 100% certain about the explanation here. I think it just is the way substitute works. From the help page for substitute:

Substitution takes place by examining each component of the parse tree as follows: (...) If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted (...).

In the current environment, condition is a promise, so the expression slot is filled, and more importantly, condition_call receives a symbol as a value. In the calling environment, condition is just an ordinary variable, so the value (the expression) is substituted.

c06n
  • 166
  • 1
  • 7