35

raw is a data.table and the following code works:

raw[,r_responseTime] #Returns the whole column
raw[,c_filesetSize]  #Same as above, returns column
plot(raw[,r_responseTime]~raw[,c_filesetSize]) #draws something

Now I want to specify these columns from a string, so for example:

col1="r_reponseTime"
col2="c_filesetSize"

How can I now achieve the same as above while referencing the columns by the string?

raw[,col1] #Returns the whole column
raw[,col2]  #Same as above, returns column
plot(raw[,col1]~raw[,col2]) #draws something

Does not work, of course because I need some kind of "dereferencation". I didn't know what to search in the help and the internet, so sorry for the dumb question.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
theomega
  • 31,591
  • 21
  • 89
  • 127
  • 3
    In addition to the answers, try `with=FALSE`. Also, see FAQs 1.5, 1.6 and 1.7. – Matt Dowle Mar 26 '12 at 11:06
  • `with=FALSE` does not seem to work with the `by` argument, any solution for that? – tlamadon Feb 26 '13 at 01:01
  • Well, actually, a vector of strings works out of the box in the `by` argument. – tlamadon Feb 26 '13 at 01:11
  • 2
    Man, this is a *really* annoying part of data.table... If you write it one way, it works with dataframes, and if you fix it for data.table, it fails for dataframes. Is there no general solution? – naught101 Sep 10 '14 at 06:17
  • @naught101 I use standard base R `raw[[col1]]` for selecting a single column as a vector from a data.table where `col1` contains which one. I don't see why people are trying to use data.table `[...]` for that. The NEWS items explicitly recommend `[[` and `$` on data.table where whole columns are required as vectors. Maybe this advice needs to be added to `?data.table`. – Matt Dowle Feb 07 '17 at 23:56
  • @naught101 More annoying was that `DT[,1]`, `DT[,3:10]` and `DT[,colP:colW]` didn't work before. They all work now in recent versions to alleviate that annoyance. Without losing the convenience and power that `j` can be expressions of column names directly. – Matt Dowle Feb 08 '17 at 00:04
  • @Frank how is this a duplicate of a question asked later? – theomega Mar 07 '19 at 14:47
  • The dupe target has an answer by the package's author that is better maintained (eg, includes the `..x` notation). Looking around on meta, it seems like this sort of closure is regarded as okay https://meta.stackexchange.com/a/147651 – Frank Mar 07 '19 at 15:08

4 Answers4

36

It would be nice if you had provided a reproducible example, or at the very least shown what the column names of raw are and what r_responseTime and c_filesetSize contain. This being said, get is your function for dereferencing so give these a try:

raw[, get(col1)]
raw[, get(col2)]
plot(raw[, get(col1)] ~ raw[, get(col2)])
flodel
  • 87,577
  • 21
  • 185
  • 223
10

A modern approach is to use ..:

raw[ , ..col1]

.. "looks up a level" to find col1.


An older, less preferred alternative is to use the match() function or %in% operator.

raw[, match(col1, names(raw)),with=FALSE]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Etienne Low-Décarie
  • 13,063
  • 17
  • 65
  • 87
9

If you have a vector of strings, you can use mget

cols = c("r_reponseTime", "c_filesetSize")
raw[, mget(cols)]
tadejsv
  • 2,085
  • 1
  • 18
  • 20
  • 1
    Thank you for this answer!! This is the only thing that worked for me (more specific situation) and I was searching a good 30 min before anyone mentioned `mget`. – daniel.s Jun 22 '23 at 16:06
1

Unfortunately "get" can be problematic! See example below:

m = 100
x1 = sample(c("cat", "dog", "horse"), m, replace=TRUE)
y1 = rnorm(m)
fill1 = sample(c("me", "myself", "dude"), m, replace=TRUE)
df = data.frame("x"=x1, "y"=y1, "fill"=fill1)
dt = data.table(df)

get does not work!

y = "y"
dt[ , get(y)]

get works!

yCol = "y"
dt[ , get(yCol)]

works always, but it's not pretty!

eval(parse(text = paste0("values = dt[ ,",  y, "]")))
eval(parse(text = paste0("values = dt[ ,",  yCol, "]")))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
rz1317
  • 99
  • 1
  • 2