Combining vector and binary search in data.table

Question

Sometimes, I have a keyed data.table which I'd like to subset according to its key and an unkeyed column. What's the simplest/fastest way to do this?

What feels most natural is an error:

dt <- data.table(id = 1:100, var = rnorm(100), key = "id")
dt[.(seq(1, 100, 2)) & var > 0, ]

The next cleanest thing is to chain:

dt[.(seq(1, 100, 2))][var > 0, ]

And of course we can ditch using binary search at all (I think this is clearly to be avoided):

dt[id %in% seq(1, 100, 2) & var > 0, ]

Is there an approach I'm missing? Also, any particular reason why the first is an error? The syntax seems clear enough to me.

I'm betting on the "clean" chain. If your second condition is an inequality, I doubt the current system of indexing can help. There is "auto indexing" on equality conditions now, but I'm not sure about the details. It's mentioned in the news: https://github.com/Rdatatable/data.table If you need to do a `by=.EACHI` with your subset, you'll have to switch the chain around, I guess. `dt[var>2][.(seq(1,100,2)),...do stuff...,by=.EACHI]` — Frank, May 14 '15 at 01:09
see comments [here](http://stackoverflow.com/a/29668066/817778) — eddi, May 14 '15 at 03:51
so it seems like the answer really depends on what I want to do in `j`, is that safe to say? — MichaelChirico, May 14 '15 at 04:26

score 0 · Accepted Answer · answered Dec 21 '15 at 02:45

As of this writing, the native way to do:

dt[.(seq(1, 100, 2)) & var > 0, j] #some expression j

is the following:

dt[.(seq(1, 100, 2)), .SD[var > 0, j]]

The more I work with data.table, the more natural this is, but it still looks a bit unintuitive. C'est la vie.

Combining vector and binary search in data.table

1 Answers1