I have a data table with start_date
and end_date
. Now we are on the 2010-01-05 and I want to filter this table, such that, I get returned only the rows that include my date. I understand that this can easily be done using vector scan, by:
library(data.table)
dt <- data.table(start_date=20100101:20100120, end_date=20100105:20100124, value= 1:20)
dt[start_date <= 20100105 & end_date >20100105, ]
This yields:
dt[start_date <= 20100105 & end_date >20100105, ]
start_date end_date value
1: 20100102 20100106 2
2: 20100103 20100107 3
3: 20100104 20100108 4
4: 20100105 20100109 5
However, this will be inefficient for very large tables (20-50 Millions of rows). I know I can select a certain date using binary search facility of data.table by writing dt[.(20100102, 20100106), ]
, if the table is keyed. But how can I leverage the binary search such that I can scan for the ranges as in above exercises.