Subset in i by variable name in data.table

Question

Suppose I have a data.table with columns names that are specified in a variable. For example I might have used dcast as:

groups <- sample(LETTERS, 2) # i.e. I don't now the values
dt1 <- data.table(ID = rep(1:2, each = 2), group = groups, value = 3:6)
(dt2 <- dcast(dt1, ID~group, value.var = "value"))
#    ID D Q
# 1:  1 3 4
# 2:  2 5 6

Now I want to subset based on values in the last two columns, e.g. do something like:

dt2[groups[1] == 3 & groups[2] == 4]
# Empty data.table (0 rows) of 3 cols: ID,D,Q

Is there an easy way? I found I can do this with keys:

setkeyv(dt2, groups)
dt2[.(3, 4)]
#    ID D Q
# 1:  1 3 4

But how do I do something more elaborate, as

dt2[groups[1] > 3 & groups[2] < 7]

?

Best to use `set.seed` before making a random example. As you can see in the answer, they can't recreate your example... (having columns A, J instead). — Frank, Aug 10 '16 at 14:14
@Frank In general I agree, but it doesn't really matter in this case. A random example was used to underline that I don't know the variable names in advance. — Max, Aug 10 '16 at 14:36
@akrun There's one sense in which it is not a dupe. We now (since 1.9.7) have non-equi joins, which is what the OP could be using here to filter by conditions like `groups[1] > 3 & groups[2] < 7` with syntax like `dt2[.(3,7), on=.(D > V1, Q < 7)]`, taking advantage of binary search. However, there is no way to pass the names D and Q to `on=` dynamically yet, even with the method covered in the link, I think. — Frank, Aug 10 '16 at 18:23
@akrun And of course I don't have anything to say about "correct" use of votes. If you are being ganged up on or otherwise targeted, take it to the mods. — Frank, Aug 10 '16 at 18:24
@akrun Well, mods are authorized to escalate an issue to SO staff attention, and they can see such things, I guess. Anyways, no reason for us to be discussing this under Max's question. — Frank, Aug 10 '16 at 18:30

score 7 · Accepted Answer · answered Aug 10 '16 at 13:55

7

You can use get to (from ?get)

search by name for an object

:

dt2[get(groups[1]) > 2 & get(groups[2]) == 4]
#   ID A J
#1:  1 3 4

answered Aug 10 '16 at 13:55

Psidom

209,562
33
339
356

Yes, this makes sense, thanks! – Max Aug 10 '16 at 13:59
@ Max thanks for the update deleted my post – Arun kumar mahesh Aug 10 '16 at 14:43

score 2 · Answer 2 · answered Aug 10 '16 at 15:59

2

We can use eval with as.name and it should be faster than get

dt2[eval(as.name(groups[1])) > 2 & eval(as.name(groups[2])) == 4]
#   ID L U
#1:  1 4 3

answered Aug 10 '16 at 15:59

akrun

874,273
37
540
662

Subset in i by variable name in data.table

2 Answers2