18

I have a data.table that i want to filter based on some inequality criteria:

dt <- data.table(A=letters[1:3], B=2:4)
dt
#    A B
# 1: a 2
# 2: b 3
# 3: c 4

dt[B>2]
#    A B
# 1: b 3
# 2: c 4

The above works well as a vector scan solution. But I can't work out how to combine this with variable names for the columns:

mycol <- "B"
dt[mycol > 2]
#    A B      // Nothing has changed
# 1: a 2
# 2: b 3
# 3: c 4

How do I work around this? I know I can use binary search by setting keys using setkeyv(dt, mycol) but I can't see a way of doing a binary search based on some inequality criteria.

MattLBeck
  • 5,701
  • 7
  • 40
  • 56

3 Answers3

15

OK, then, Use get(mycol) because you want the argument to dt[ to be the contents of the object "mycol" . I believe dt[mycol ...] looks for a "mycol" thingie in the data.table object itself, of which of course there is no such animal.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • This saved me. But even if you do `dt['B' > 2]`, you have to do `dt[get('B') > 2]`. Kind of annoying/seems like column references should work as a string. – wordsforthewise Aug 27 '17 at 20:57
  • @wordsforthewise Thanks for the correction. I have to admit I haven't looked at this problem in the last 4 years! – Carl Witthoft Aug 28 '17 at 11:25
5

There is an accesor function provided for this. j is evaluated in the frame of X, i.e. your data.table, unless you specify with = FALSE. This would be the canonical way of doing this.

dt[ , mycol , with = FALSE ]
   B
1: 2
2: 3
3: 4

Return column, logical comparison, subset rows...

dt[ c( dt[ , mycol , with = FALSE ] > 2 ) ]
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • @Mattrition see update. I don't like `get` because you might encounter some unexpected behaviour evaluating this in some nested calling environments. I'd jsut figure that `with = FALSE` would be provided for a reason. – Simon O'Hanlon Dec 13 '13 at 17:37
  • 1
    Ok, your answer now provides a different an alternative solution, so thank you. A third alternative I found is `dt[dt[[mycol]] > 2]`. – MattLBeck Dec 13 '13 at 17:37
  • @Mattrition ** that seems the most obvious. You should answer your own question with that. – Simon O'Hanlon Dec 13 '13 at 17:38
  • I really dislike both this and the `[[` alternatives, because they are a lot more error prone (since you have to specify the `data.table` name twice) – eddi Dec 13 '13 at 18:41
5

Another alternative is to use ]] to retrieve B as a vector, and subset using this:

dt[dt[[mycol]] > 2]
MattLBeck
  • 5,701
  • 7
  • 40
  • 56