1

I've got multiple massive data sets in data.tables and some other data.tables containing lists of what amount to essentially expressions I'd like to execute against them using something like :=.

Sample data:

library(data.table)
tt = data.table(date=c(2011, 2012, 2013, 2014), count=c(2774343,4655434,648113695, 357733805))

   date     count
1: 2011   2774343
2: 2012   4655434
3: 2013 648113695
4: 2014 357733805

Sample transformation table. Some columns may be new, others may be modifying pre-existing columns. I need them to take full advantage of the 'with' feature, meaning they need to reference existing columns even if they are creating new ones.

xform=data.table(var=c("date2", "count2"), val=c("date - 2000", "count / 1000"))

      var          val
1:  date2  date - 2000
2: count2 count / 1000

I just can't imagine the magic formula needed to get this to work. I've tried various combinations of lapply, parse, eval, etc. inside [.data.table using :=.

My last hope was this:

> xform[,expr := lapply(val, FUN=function(x) parse(text=x))]
> tt[,(xform$var) := eval(xform$expr)]
Error in eval(expr, envir, enclos) : attempt to apply non-function

The trick is my input data is massive and contains up to 100 columns, and while some of the transformations may be trivial, others may be sophisticated.

In this case, the output should be something like:

   date     count date2     count2
1: 2011   2774343    11   2774.343
2: 2012   4655434    12   4655.434
3: 2013 648113695    13 648113.695
4: 2014 357733805    14 357733.805

Thanks in advance for any help!

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Grisby_2133
  • 487
  • 2
  • 11
  • I only managed to get it work with one expression, e.g., `xform=data.table(var="date2", val="date - 2000"); tt[,xform$var := eval(parse(text = xform$val))]` – David Arenburg Jun 10 '14 at 20:02
  • I tried that too! Since parse can't take a list, I was trying the lapply(val,parse) and eval stages separately; I think I could be bumping into some of the internal logic of `[.data.table` and `:=`, but I can't imagine what. – Grisby_2133 Jun 10 '14 at 20:08

1 Answers1

1

I think you have to do this row by row (of xform):

for(i in 1:nrow(xform))
  tt[, xform$var[i] := eval(parse(text = xform$val[i]))]

tt
#   date     count date2     count2
#1: 2011   2774343    11   2774.343
#2: 2012   4655434    12   4655.434
#3: 2013 648113695    13 648113.695
#4: 2014 357733805    14 357733.805

If you stored the transformations as functions instead of text though, you could do the following instead:

xform = data.table(var = c("date2", "count2"),
                   val = c(quote(date - 2000), quote(count / 1000)))

tt[, xform$var := lapply(xform$val, eval, .SD)]
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Yeah, that what I was going too propose to based on my findings in the comment, but thought that you and @Arun going to laugh at me :) – David Arenburg Jun 10 '14 at 20:08
  • @DavidArenburg fair enough - another option added :) – eddi Jun 10 '14 at 20:15
  • (+1) Awesome. I think you should delete the loop one, and just leave the second solution. I liked how you've put `.SD` in the end and operated on `xform$val`, I would never come up with that – David Arenburg Jun 10 '14 at 20:20
  • Ok, so it looks like I was close: Just parse the transformations out with `xform[,expr := lapply(val, FUN=function(x) parse(text=x))]` and then eval them with `tt[,xform$var := lapply(xform$expr, eval, .SD)]` What is the magic of `.SD` for, since there's no `by` or `keyby`? – Grisby_2133 Jun 10 '14 at 20:48
  • @Grisby_2133 see [this post](http://stackoverflow.com/questions/15913832/eval-and-quote-in-data-table) for explanation about the magic of `.SD` – eddi Jun 10 '14 at 20:53