1

This is in data.table 1.9.4.

context

I'm wrapping up an ML training operation in a function call, and I want to get the levels of a column of a data.table that has been passed in. I noticed that this requires that the column argument be dequoted using get():

A minimal example to demonstrate failing approaches:

library(data.table)
test.table <- data.table(col1 = rep(c(0,1), times = 10), col2 = 1:20)
col.id <- "col1"

str(test.table[,levels(col.id),with=FALSE])

Classes ‘data.table’ and 'data.frame':  0 obs. of  0 variables
 - attr(*, ".internal.selfref")=<externalptr>

> str(test.table[,levels(factor(col.id)), with=FALSE])
Classes ‘data.table’ and 'data.frame':  20 obs. of  1 variable:
 $ col1: num  0 1 0 1 0 1 0 1 0 1 ...
 - attr(*, ".internal.selfref")=<externalptr>

> str(test.table[,levels(as.factor(col.id)), with=FALSE])
Classes ‘data.table’ and 'data.frame':  20 obs. of  1 variable:
 $ col1: num  0 1 0 1 0 1 0 1 0 1 ...
 - attr(*, ".internal.selfref")=<externalptr>

levels(test.table[,factor(col.id), with=FALSE])
NULL

levels(test.table[,as.factor(col.id), with=FALSE])
NULL

And yet, test.table[,col.id, with = FALSE] is a valid way to access the column.

Here's some things that work:

> test.table[,levels(as.factor(get(col.id)))]
[1] "0" "1"
> test.table[,levels(as.factor(get(col.id)))]
[1] "0" "1"
> test.table[,levels(factor(get(col.id)))]
[1] "0" "1"
> levels(test.table[,factor(get(col.id))])
[1] "0" "1"

Why is this? Is it intended?

bright-star
  • 6,016
  • 6
  • 42
  • 81
  • 2
    How can we know whether you're referencing a column within the frame of data.table or the column stored in the variable you're using? Imagine there's a column called `col.id` in `data.table`. How do we decide which one you want? – Arun Jan 14 '15 at 09:10
  • 1
    That's true. I have gotten very used to R and `data.table` "doing the right thing" for me ;) – bright-star Jan 14 '15 at 09:12
  • 1
    `eval(as.name(col.id))` would be probably more efficient than `get`, you can also try `.SD[[col.id]]`. – David Arenburg Jan 14 '15 at 09:31
  • I've been using `get` for readability and simplicity. I'm not smart enough for `eval`, `deparse`, etc. yet – bright-star Jan 14 '15 at 09:33
  • 1
    `.SD[[target]]` is the idiomatic way but it is not yet optimized unfortunately, you could blame Arun for that :) – David Arenburg Jan 14 '15 at 09:35
  • @DavidArenburg If you can expand that into a declarative answer, I'll be more than happy. – bright-star Jan 14 '15 at 09:39
  • I actually asked a [**similar question**](http://stackoverflow.com/questions/27677283/evaluating-both-column-name-and-the-target-value-within-j-expression-within-d) not long ago, so I'm just summarizing the conclusion here. I guess we could possibly close this duplicate if you agree. – David Arenburg Jan 14 '15 at 09:41
  • Hmm. I wish there was an automatic merge. – bright-star Jan 14 '15 at 09:44

0 Answers0