1

Sometimes, I have a keyed data.table which I'd like to subset according to its key and an unkeyed column. What's the simplest/fastest way to do this?

What feels most natural is an error:

dt <- data.table(id = 1:100, var = rnorm(100), key = "id")
dt[.(seq(1, 100, 2)) & var > 0, ]

The next cleanest thing is to chain:

dt[.(seq(1, 100, 2))][var > 0, ]

And of course we can ditch using binary search at all (I think this is clearly to be avoided):

dt[id %in% seq(1, 100, 2) & var > 0, ]

Is there an approach I'm missing? Also, any particular reason why the first is an error? The syntax seems clear enough to me.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • I'm betting on the "clean" chain. If your second condition is an inequality, I doubt the current system of indexing can help. There is "auto indexing" on equality conditions now, but I'm not sure about the details. It's mentioned in the news: https://github.com/Rdatatable/data.table If you need to do a `by=.EACHI` with your subset, you'll have to switch the chain around, I guess. `dt[var>2][.(seq(1,100,2)),...do stuff...,by=.EACHI]` – Frank May 14 '15 at 01:09
  • 2
    see comments [here](http://stackoverflow.com/a/29668066/817778) – eddi May 14 '15 at 03:51
  • so it seems like the answer really depends on what I want to do in `j`, is that safe to say? – MichaelChirico May 14 '15 at 04:26

1 Answers1

0

As of this writing, the native way to do:

dt[.(seq(1, 100, 2)) & var > 0, j] #some expression j

is the following:

dt[.(seq(1, 100, 2)), .SD[var > 0, j]]

The more I work with data.table, the more natural this is, but it still looks a bit unintuitive. C'est la vie.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198