When doing a non-equi inner join, should the order of X[Y] and Y[X] matters? I am under the impression that it should not.
library(data.table) #data.table_1.12.2
dt1 <- data.table(ID=LETTERS[1:4], TIME=2L:5L)
cols1 <- names(dt1)
dt2 <- data.table(ID=c("A", "B"), START=c(1L, 20L), END=c(3L, 30L))
cols2 <- names(dt2)
> dt1
ID TIME
1: A 2
2: B 3
3: C 4
4: D 5
> dt2
ID START END
1: A 1 3
2: B 20 30
I am trying to filter for rows in dt1
such that 1) ID matches and 2) dt1$TIME lies between dt2$START and dt2$END. Desired output:
ID TIME
1: A 2
Since I wanted rows from dt1
, I started with using dt1
as i
in data.table[
but I am getting either columns from dt2
or encountered errors:
#no error but using x. values
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L]
#error for the rest
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, mget(paste0("i.", cols1))]
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, .SD]
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, .(START)]
Error message:
Error in
[.data.table
(dt2, dt1, on = .(ID, START < TIME, END > TIME), : column(s) not found: START
So I had to use dt2
as the i
as a workaround:
#need to type out all the columns:
dt1[dt2, on=.(ID, TIME>START, TIME<END), nomatch=0L, .(ID, TIME=x.TIME)]
#using setNames
dt1[dt2, on=.(ID, TIME>START, TIME<END), nomatch=0L,
setNames(mget(paste0("x.", cols1)), cols1)]
Or is this a simple case of my misunderstanding?
References: