3

I'm trying to understand the behaviour of eval in a data.table as a "frame".

With following data.table:

set.seed(1)
foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000),  var3=sample(letters[1:5],1000,replace = T))

I'm trying to replicate this instruction

foo[var1==1 , sum(var2) , by=var3]

using a function of eval:

eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() )

As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval for test 2:

var_i="var1"
var_j="var2"
var_by="var3"

# test 1 works
foo[eval1(var_i)==1 , sum(var2) , by=var3 ]

# test 2 doesn't work
foo[var1==1 , sum(eval1(var_j)) , by=var3]

# test 3 works
foo[var1==1 , sum(var2) , by=eval1(var_by)]
statadat
  • 83
  • 1
  • 5
  • This seems to work `v1 <- parse(text=paste(var_i, "==", 1)); v2 <- parse(text=paste0("sum(", var_j, ", na.rm = TRUE)")); by1 <- parse(text=var_by); f1 <- function(dt, expr1, expr2, expr3){ dt[eval(expr1), eval(expr2), by=eval(expr3)] }; f1(foo, v1, v2, by1)` – akrun Nov 12 '14 at 10:27
  • Thanks @akrun, it works but it isn't what I'm looking for. I would try (if possible) to understand the frame of the data.table in the j expression ( DT[i,j,by] ). Thanks again, hope my question is clear... – statadat Nov 12 '14 at 10:44

1 Answers1

2

The j-exp, checks for it's variables in the environment of .SD, which stands for Subset of Data. .SD is itself a data.table that holds the columns for that group.

When you do:

foo[var1 == 1, sum(eval(parse(text=var_j))), by=var3]

directly, the j-exp gets internally optimised/replaced to sum(var2). But sum(eval1(var_j)) doesn't get optimised, and stays as it is.

Then when it gets evaluated for each group, it'll have to find var2, which doesn't exist in the parent.frame() from where the function is called, but in .SD. As an example, let's do this:

eval1 <- function(s) eval(parse(text=s), envir=parent.frame())
foo[var1 == 1, { var2 = 1L; eval1(var_j) }, by=var3]
#    var3 V1
# 1:    e  1
# 2:    c  1
# 3:    a  1
# 4:    b  1
# 5:    d  1

It find var2 from it's parent frame. That is, we have to point to the right environment to evaluate in, with an additional argument with value = .SD.

eval1 <- function(s, env) eval(parse(text=s), envir = env, enclos = parent.frame())
foo[var1 == 1, sum(eval1(var_j, .SD)), by=var3]
#    var3         V1
# 1:    e  11.178035
# 2:    c -12.236446
# 3:    a  -8.984715
# 4:    b  -2.739386
# 5:    d  -1.159506
Arun
  • 116,683
  • 26
  • 284
  • 387