3

Suppose I have a data.table with columns names that are specified in a variable. For example I might have used dcast as:

groups <- sample(LETTERS, 2) # i.e. I don't now the values
dt1 <- data.table(ID = rep(1:2, each = 2), group = groups, value = 3:6)
(dt2 <- dcast(dt1, ID~group, value.var = "value"))
#    ID D Q
# 1:  1 3 4
# 2:  2 5 6

Now I want to subset based on values in the last two columns, e.g. do something like:

dt2[groups[1] == 3 & groups[2] == 4]
# Empty data.table (0 rows) of 3 cols: ID,D,Q

Is there an easy way? I found I can do this with keys:

setkeyv(dt2, groups)
dt2[.(3, 4)]
#    ID D Q
# 1:  1 3 4

But how do I do something more elaborate, as

dt2[groups[1] > 3 & groups[2] < 7]

?

Max
  • 514
  • 2
  • 12
  • Best to use `set.seed` before making a random example. As you can see in the answer, they can't recreate your example... (having columns A, J instead). – Frank Aug 10 '16 at 14:14
  • @Frank In general I agree, but it doesn't really matter in this case. A random example was used to underline that I don't know the variable names in advance. – Max Aug 10 '16 at 14:36
  • 1
    @akrun There's one sense in which it is not a dupe. We now (since 1.9.7) have non-equi joins, which is what the OP could be using here to filter by conditions like `groups[1] > 3 & groups[2] < 7` with syntax like `dt2[.(3,7), on=.(D > V1, Q < 7)]`, taking advantage of binary search. However, there is no way to pass the names D and Q to `on=` dynamically yet, even with the method covered in the link, I think. – Frank Aug 10 '16 at 18:23
  • @akrun And of course I don't have anything to say about "correct" use of votes. If you are being ganged up on or otherwise targeted, take it to the mods. – Frank Aug 10 '16 at 18:24
  • @akrun Well, mods are authorized to escalate an issue to SO staff attention, and they can see such things, I guess. Anyways, no reason for us to be discussing this under Max's question. – Frank Aug 10 '16 at 18:30

2 Answers2

7

You can use get to (from ?get)

search by name for an object

:

dt2[get(groups[1]) > 2 & get(groups[2]) == 4]
#   ID A J
#1:  1 3 4
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

We can use eval with as.name and it should be faster than get

dt2[eval(as.name(groups[1])) > 2 & eval(as.name(groups[2])) == 4]
#   ID L U
#1:  1 4 3
akrun
  • 874,273
  • 37
  • 540
  • 662