I'm trying to join two data frames (data.tables), x and y, on x$val >= y$start & x$val <= y$end. I can't use dplyr because the only way to do inequality joins in dplyr is to join the tables then filter on the inequality, and the tables I want to join have 315k and 84k records. That would blow out memory.
data.table does have inequality joins, but I cannot for the life of me figure out how the syntax works. See this result:
x <- data.table(val = c(1:5), id = "a")
y <- data.table(start = c(1:5), end = c(11:15), id= "a")
x[y, on=c("val>=start","val<=end"),
.(start, val, end)]
start val end
1: 1 1 11
2: 1 1 11
3: 1 1 11
4: 1 1 11
5: 1 1 11
6: 2 2 12
7: 2 2 12
8: 2 2 12
9: 2 2 12
10: 3 3 13
11: 3 3 13
12: 3 3 13
13: 4 4 14
14: 4 4 14
15: 5 5 15
To show what I would expect to get, here is what dplyr produces:
x <- data.table(val = c(1:5), id = "a")
y <- data.table(start = c(1:5), end = c(11:15), id= "a")
x %>%
inner_join(y) %>%
filter(val >= start & val <= end)
val id start end
1 1 a 1 11
2 2 a 1 11
3 2 a 2 12
4 3 a 1 11
5 3 a 2 12
6 3 a 3 13
7 4 a 1 11
8 4 a 2 12
9 4 a 3 13
10 4 a 4 14
11 5 a 1 11
12 5 a 2 12
13 5 a 3 13
14 5 a 4 14
15 5 a 5 15
Can anyone explain what it is I'm missing with the data.table syntax?