This is in data.table
1.9.4.
context
I'm wrapping up an ML training operation in a function call, and I want to get the levels of a column of a data.table
that has been passed in. I noticed that this requires that the column argument be dequoted using get()
:
A minimal example to demonstrate failing approaches:
library(data.table)
test.table <- data.table(col1 = rep(c(0,1), times = 10), col2 = 1:20)
col.id <- "col1"
str(test.table[,levels(col.id),with=FALSE])
Classes ‘data.table’ and 'data.frame': 0 obs. of 0 variables
- attr(*, ".internal.selfref")=<externalptr>
> str(test.table[,levels(factor(col.id)), with=FALSE])
Classes ‘data.table’ and 'data.frame': 20 obs. of 1 variable:
$ col1: num 0 1 0 1 0 1 0 1 0 1 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(test.table[,levels(as.factor(col.id)), with=FALSE])
Classes ‘data.table’ and 'data.frame': 20 obs. of 1 variable:
$ col1: num 0 1 0 1 0 1 0 1 0 1 ...
- attr(*, ".internal.selfref")=<externalptr>
levels(test.table[,factor(col.id), with=FALSE])
NULL
levels(test.table[,as.factor(col.id), with=FALSE])
NULL
And yet, test.table[,col.id, with = FALSE]
is a valid way to access the column.
Here's some things that work:
> test.table[,levels(as.factor(get(col.id)))]
[1] "0" "1"
> test.table[,levels(as.factor(get(col.id)))]
[1] "0" "1"
> test.table[,levels(factor(get(col.id)))]
[1] "0" "1"
> levels(test.table[,factor(get(col.id))])
[1] "0" "1"
Why is this? Is it intended?