0

I'm just trying to wrap my brain around data.tables in R, and I keep coming up empty with anything other than the simplest queries. If I have a data.table call it dtable) with three integer columns each representing dates, d1, d2, and d3 (e.g., 20060415) and want to select a complex subset related to a specific date, this produces the desired result, but results in a vector scan:

dtable[ d1 <= date & (d2 > date | d3 == date) ]

Even if I call setkey(dtable, d1, d2, d3) I seem to get a vector scan (almost certainly multiple vector scans). In my reading of the documentation, I never saw any examples where selectors in the i/where field were effectively anything but ==.

If I simplify the expression, how can this selection be speeded up using data.table?

dtable[ d1 <= date ]
smontanaro
  • 1,537
  • 3
  • 15
  • 26
  • 2
    There's a related open feature request: https://github.com/Rdatatable/data.table/issues/1453 By the way, to verify whether it's vector scan or not, you can add `DT[..., verbose = TRUE ]` which will print extra details. If you have a more specific question, you'll probably want to edit in a concrete example. Advice on that here: http://stackoverflow.com/a/28481250/ – Frank Nov 15 '16 at 20:34
  • 1
    If you're just trying to "wrap your brain around data.tables", I highly doubt this is something you actually need. That's not to say it's not an interesting question, just that it's unlikely to be a useful one. Please feel free to prove me wrong. – eddi Nov 15 '16 at 20:48
  • 2
    Oh, and another, re your second question: https://github.com/Rdatatable/data.table/issues/1068 – Frank Nov 15 '16 at 20:55

0 Answers0