8

Note: The precise problem I hit in this question does not apply to recent versions of data table. If you want to do something like described in the title, check out the corresponding question in the package FAQ, 1.6 OK, but I don’t know the expressions in advance. How do I programatically pass them in?.

I have seen an answer that illustrates how to construct an expression to be evaluated in

DT[,j=eval(expr)]

I am using this with an assignment, ```:=`(mycol=my_calculation)``, and I'm wondering...

  • How can I assign the name "mycol" dynamically?
  • What is the correct way to let "my_calculation" take a dynamically-determined set of columns?

By "dynamically", I mean "determined after I write the code for my expr".

New example

EDIT: To better illustrate the issue, here is different example. Look in the edit history to see the original.

require(data.table)
require(plyr)
options(datatable.verbose=TRUE)
DT <- CJ(a=0:1,b=0:1,y=2)

# setup:
expr  <- as.quoted(paste(expression(get(col_in_one)+get(col_in_two))))[[1]]

# usage: 
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,(col_out):=eval(expr)] # fails, should take the form j=eval(expr)

I want to keep the setup and usage stages separate, so my code is easier to maintain. My real expression is messier than this example (where it just chooses one column).

Questions

First question: How can I make the assigned-to column, "col_out", dynamic? I mean: I want to specify both "cols_in_*" and "col_out" on the fly.

I have tried creating various expressions in "expr", but as.quoted throws an error about not putting certain stuff to the left of the = symbol.

Second question: How can I avoid the warnings against using get?

The warnings suggest using .SDcols, to let [.data.table know which columns I am using. However, if I use the .SDcols argument, another warning says there's no point doing that unless .SD is being used.

Tentative solution

The solutions I have so far are...

# Ricardo + eddi:
expr2 <- as.quoted(paste(expression(`:=`(
  Vtmp=.SD[[col_in_one]]+.SD[[col_in_two]]))))[[1]]

# usage
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,eval(expr2),.SDcols=c(col_in_one,col_in_two)]
setnames(DT,'Vtmp',col_out)

This still involves the minor annoyance of doing the operation in two steps and keeping track of "Vtmp", so the first question is still partly open.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180

2 Answers2

8

Maybe I don't understand the problem well, but does this suffice:

DT[, (col_out) := .SD[[col_in_one]]+.SD[[col_in_two]],
     .SDcols = c(col_in_one,col_in_two)]
DT
#   a b y bah
#1: 0 0 2   0
#2: 0 1 2   1
#3: 1 0 2   1
#4: 1 1 2   2

To answer the edited question, to get the eval to work, use .SD as environment:

DT[, (col_out) := eval(expr, .SD)]

Also, see this question and the update there - eval and quote in data.table

Community
  • 1
  • 1
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Something like this might be okay, but I would prefer to keep my expression and my use of it (possibly in multiple places) separate. Also, this ought to be slow, since you're creating `.SD` and also calling `[.data.table` for "th", right? Is `.SD[[x]]` better than `get(x)`? ...Ok then, Ricardo's link explained that it is better than `get`. – Frank Oct 09 '13 at 16:00
  • @Frank sorry I don't think I understand what you want to do - a simpler example that concentrates on the issue would help. Re `get`: using `.SD` with `.SDcols` is better than a `get`, because in the first case only the columns in `.SDcols` get constructed for `.SD`; and since here all of `.SD` is used, there should be no overhead for using `.SD` (but it's probably possible to do the whole reduce business better) – eddi Oct 09 '13 at 16:03
  • Thanks for suggesting it. I've added some bold text (which I hate doing) and changed the example. I'll edit your answer to match the new example. – Frank Oct 09 '13 at 16:48
  • Ok, so now I'm using `DT[,(col_out):=eval(expr,.SD),.SDcols=c(col_in_one,col_in_two)]`, where "expr" is my expression with `get`s in it. Problem solved! – Frank Oct 09 '13 at 17:11
  • @Frank I suggest heeding the warning and using `.SD[[col]]` instead of `get` (otherwise all the variables are constructed in `.SD`) – eddi Oct 09 '13 at 17:15
  • Okay, then I'll use `expr3 <- as.quoted(paste(expression(.SD[[1]]+.SD[[2]])))[[1]]` instead (or the equivalent expression with names inside [[]]). – Frank Oct 09 '13 at 17:22
5

The simplest way is to set it AFTER you evaluate expression. Afterall, the time to execute that is constant and nearly 0.

someDummyVar <- "tempColName_XCWF5D"
DT [, (someDummyVar) := eval(expr) ]

setnames(DT, someDummyVar, RealColumnName)

As for question two: Don't turn on verbose warnings and you wont get verbose warnings ;)

options(datatable.verbose=FALSE)

As for Reduce : try posting that as a separate and simplified question so that it is easier to follow what you are doing (outside of the eval issues)

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • +1. Yeah, that's true; I just don't want to have to maintain twice as much code to run the operation. I guess I can bundle them into a function. Any ideas about #2 -- avoiding the `get` warning; or #3 -- giving `eval` the linear expression itself instead of a `Reduce` that it must convert... – Frank Oct 09 '13 at 15:42
  • @Frank, have a look at the question linked above – Ricardo Saporta Oct 09 '13 at 15:47
  • I have taken out the Reduce part and will post it as a separate question. Thanks! – Frank Oct 09 '13 at 16:52