21

What am I missing here?

d = data.table(a = 1:5)

d[, a]                   # 1 2 3 4 5
d[, sum(a)]              # 15

d[, eval(quote(a))]      # 1 2 3 4 5
d[, sum(eval(quote(a)))] # 15

quoted_a = quote(a)
d[, eval(quoted_a)]      # 1 2 3 4 5
d[, sum(eval(quoted_a))] # Error in eval(expr, envir, enclos) : object 'a' not found

What is going on? I'm running R 2.15.0 and data.table 1.8.9.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • I had a problem like this today; was solved by using `quote(sum(a))` instead of `expression(sum(a))`. No idea why it mattered. – rbatt Dec 04 '15 at 02:18

1 Answers1

22

UPDATE (eddi): As of version 1.8.11 this has been fixed and .SD is not needed in cases where the expression can be evaluated in place, like in OP. Since currently presence of .SD triggers construction of full .SD, this will result in much faster speeds in some cases.


What's going on is that calls to eval() are treated differently than you likely imagine in the code that implements [.data.table(). Specifically, [.data.table() contains special evaluation branches for i and j expressions that begin with the symbol eval. When you wrap the call to eval inside of a call to sum(), eval is no longer the first element of the parsed/substituted expression, and the special evaluation branch is skipped.

Here is the bit of code in the monster function displayed by typing getAnywhere("[.data.table") that makes a special allowance for calls to eval() passed in via [.data.table()'s j-argument:

jsub = substitute(j)
    ...
    # Skipping some lines
    ...
jsubl = as.list.default(jsub)
if (identical(jsubl[[1L]], quote(eval))) {  # The test for eval 'on the outside' 
    jsub = eval(jsubl[[2L]], parent.frame(), parent.frame())
    if (is.expression(jsub)) 
        jsub = jsub[[1L]]
}

As a workaround, either follow the example in data.table FAQ 1.6 (pdf here), or explicitly point eval() towards .SD, the local variable that holds columns of whatever data.table you are operating on (here d). (For some more explanation of .SD's role, see the first few paragraphs of this answer).

d[, sum(eval(quoted_a, envir=.SD))]
Community
  • 1
  • 1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • ok, how can I fix this? I'd like to pass a `data.table` and a variable name to a function that would compute e.g. the `sum`. I was using the above code before (and I could swear it used to work with a previous version of either `R` or `data.table`), but that seems to be out of the question ...? – eddi Apr 09 '13 at 22:59
  • 1
    How about `quoted_a <- quote(sum(a))` as shown in the [**faq 1.6**](http://datatable.r-forge.r-project.org/datatable-faq.pdf)? – Arun Apr 09 '13 at 23:02
  • that won't work for me, whoever is calling the function doesn't know what kind of evaluations are going to happen there – eddi Apr 09 '13 at 23:03
  • 3
    @eddi -- use the correct environment for `eval` when not using the special case in `[.data.table` ([this answer]n(http://stackoverflow.com/a/11874464/1385941) may be useful (using `sum(eval(quoted_a,.SD))` will work....) – mnel Apr 09 '13 at 23:06
  • @mnel, how can one get the environment where the `expr` is to be evaluated? I mean, how did you get to `.SD` here? I tried with `parent.frame()` and it gave the same error. Any (other) references? – Arun Apr 09 '13 at 23:11
  • @Arun -- Are you saying this gives you an error? `d[, sum(eval(quoted_a, .SD))]` – Josh O'Brien Apr 09 '13 at 23:12
  • @JoshO'Brien, no, it works perfectly. My question is how to find out that the "envir" argument is `.SD` here... That is, I tried `d[, sum(eval(quoted_a, parent.frame()))]` and it gave the same error.. Perhaps I'm confused about the concept of `parent.frame()`. – Arun Apr 09 '13 at 23:15
  • 2
    @Arun -- Oh, I see. I think you just have to know about `.SD`'s central role in evaluation of all `i` and `j` expressions. The opening notes in [my answer here](http://stackoverflow.com/questions/15667984/can-sd-be-viewed-from-a-browser-within-data-table/15668311#15668311) should make it pretty clear why `.SD` is the environment you want. I expect that you can't find this out via calls to `parent.frame()` because `[.data.table` sidesteps/reroutes some of R's typical scoping rules. – Josh O'Brien Apr 09 '13 at 23:18
  • @JoshO'Brien, that answers my question. Thank you. However, (another hidden question) without knowing `.SD` is the environment, can we get the "environment" of expressions executed in `j` to be `.SD`. That is, can we using R-commands "discover" the environment of where an expression is executed.. I am beginning to think that I am making less sense with this question, but had to ask. – Arun Apr 09 '13 at 23:25
  • 3
    @Arun -- You're making plenty of sense. **data.table**'s power comes at the cost to its users of having to build a whole additional model of how its magic "really" happens. I'm still on the steep part of that learning curve myself ;) – Josh O'Brien Apr 09 '13 at 23:30
  • @JoshO'Brien Agreed, that's true of advanced users. But most users can just follow the idiomatic patterns (hopefully?) without having to know how it really happens. – Matt Dowle Apr 10 '13 at 09:51
  • 3
    For all that, I submit that `data.table` should not do anything that breaks the expected behavior of nested functions at the command line. if `regular_matrix[,sum(eval(quoted_a))]` "works," then `data.table`'s failure to do so should be considered a bug. – Carl Witthoft Apr 10 '13 at 12:43
  • @Carl I agree. The `data.frame` analog of the above: `with(data.frame(d), sum(eval(quoted_a)))` works as expected. lol, in fact `with(d, sum(eval(quoted_a)))` also works as expected. – eddi Apr 10 '13 at 19:04
  • @MatthewDowle -- For the most part I agree, and I wouldn't go nearly so far as to call data.table's behavior a bug, but it definitely does violate the principle of least surprise in a few places. (Maybe it has to, though. I was actually diving into `[.data.table` to test out some ideas about possible alternatives when I ran into the problem reported in [my question of a few hours ago](http://stackoverflow.com/questions/15931801/why-does-trace-edit-true-not-work-when-data-table)!) – Josh O'Brien Apr 10 '13 at 19:36