One solution is to put the list(...)
within the function output.
I tend to use as.quoted
, stealing from the way @hadley implements .()
in the plyr
package.
library(data.table)
library(plyr)
dat <- data.table(x_one=1:10, x_two=1:10, y_one=1:10, y_two=1:10)
myfun <- function(name) {
one <- paste0(name, '_one')
two <- paste0(name, '_two')
out <- paste0(name,'_out')
as.quoted(paste('list(',out, '=',one, '-', two,')'))[[1]]
}
dat[, eval(myfun('x')),]
# x_out
# 1: 0
# 2: 0
# 3: 0
# 4: 0
# 5: 0
# 6: 0
# 7: 0
# 8: 0
# 9: 0
#10: 0
To do two columns at once you can adjust your call
myfun <- function(name) {
one <- paste0(name, '_one')
two <- paste0(name, '_two')
out <- paste0(name,'_out')
calls <- paste(paste(out, '=', one, '-',two), collapse = ',')
as.quoted(paste('list(', calls,')'))[[1]]
}
dat[, eval(myfun(c('x','y'))),]
# x_out y_out
# 1: 0 0
# 2: 0 0
# 3: 0 0
# 4: 0 0
# 5: 0 0
# 6: 0 0
# 7: 0 0
# 8: 0 0
# 9: 0 0
# 0: 0 0
As for the reason.....
in this solution the entire call to 'list(..)
is evaluated within the parent.frame being the data.table.
The relevant code within [.data.table
is
if (missing(j)) stop("logical error, j missing")
jsub = substitute(j)
if (is.null(jsub)) return(NULL)
jsubl = as.list.default(jsub)
if (identical(jsubl[[1L]],quote(eval))) {
jsub = eval(jsubl[[2L]],parent.frame())
if (is.expression(jsub)) jsub = jsub[[1L]]
}
if (in your case)
j = list(xout = eval(myfun('x')))
##then
jsub <- substitute(j)
is
# list(xout = eval(myfun("x")))
and
as.list.default(jsub)
## [[1]]
## list
##
## $xout
## eval(myfun("x"))
so jsubl[[1L]]
is list
, jsubl[[2L]]
is eval(myfun("x"))
so data.table
has not found a call to eval
and will not deal with it appropriately.
This will work, forcing the second evaluation within correct data.table
# using OP myfun
dat[,list(xout =eval(myfun('x'), dat))]
The same way
eval(parse(text = 'x_one'),dat)
# [1] 1 2 3 4 5 6 7 8 9 10
Works but
eval(eval(parse(text = 'x_one')), dat)
Does not
Edit 10/4/13
Although it is probably safer (but slower) to use .SD
as the environment, as it will then be robust to i
or by
as well eg
dat[,list(xout =eval(myfun('x'), .SD))]
Edit from Matthew :
+10 to above. I couldn't have explained it better myself. Taking it a step further, what I sometimes do is construct the entire data.table query and then eval
that. It can be a bit more robust that way, sometimes. I think of it like SQL; i.e, we often construct a dynamic SQL statement that is sent to the SQL server to be executed. When you are debugging, too, it's also sometimes easier to look at the constructed query and run that at the browser prompt. But, sometimes such a query would be very long, so passing eval
into i
,j
or by
can be more efficient by not recomputing the other components. As usual, many ways to skin the cat.
The subtle reasons for considering eval
ing the entire query include :
One reason grouping is fast is that it inspects the j
expression first. If it's a list
, it removes the names, but remembers them. It then eval
s an unnamed list for each group, then reinstates the names once, at the end on the final result. One reason other methods can be slow is the recreation of the same column name vector for each and every group, over and over again. The more complex j
is defined though (e.g. if the expression doesn't start precisely with list
), the harder it gets to code up the inspection logic internally. There are lots of tests in this area; e.g., in combination with eval
, and verbosity reports if name dropping isn't working. But, constructing a "simple" query (the full query) and eval
ing that may be faster and more robust for this reason.
With v1.8.2 there's now optimization of j
: options(datatable.optimize=Inf)
. This inspects j
and modifies it to optimize mean
and the lapply(.SD,...)
idiom, so far. This makes orders of magnitude difference and means theres less for the user to need to know (e.g. a few of the wiki points have gone away now). We could do more of this; e.g., DT[a==10]
could be optimized to DT[J(10)]
automatically if key(DT)[1]=="a"
[Update Sep 2014 - now implemented in v1.9.3]. But again, the internal optimizations get harder to code up internally if rather than DT[,mean(a),by=b]
it's DT[,list(x=eval(expr)),by=b]
where expr
contained a call to mean
, for example. So eval
ing the entire query may play nicer with datatable.optimize
. Turning verbosity on reports what it's doing and optimization can be turned off if needed; e.g., to test the speed difference it makes.
As per comments, FR#2183 has been added: "Change j=list(xout=eval(...))'s eval to eval within scope of DT". Thanks for highlighting. That's the sort of complex j
I mean where the eval
is nested in the expression. If j
starts with eval
, though, that's much simpler and already coded (as shown above) and tested, and should be optimized fine.
If there's one take-away from this then it's: do use DT[...,verbose=TRUE]
or options(datatable.verbose=TRUE)
to check data.table
is still working efficiently when used for dynamic queries involving eval
.