In answering this question about rolling joins with the data.table
package, I've run into some odd behavior when using multiple conditions.
Considering the following datasets:
dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place"))
dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))
When I want to join dt
with dt_lu
where only those cases of dt_lu
that have the same place
and where dt_lu$num
is higher than dt$num
as follows:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI]
I get the desired result:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 8 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 8 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 8 2
10: d 2 6.2 9 1
11: d 2 6.2 7 2
12: d 5 6.2 9 1
13: d 5 6.2 7 2
When I want to add an additional condition, I can get the desired result easily by chaining that additional conditions as follows:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI][fnum - tnum < 2]
which gives me:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 4 5.1 6 1
4: a 4 5.1 6 2
5: a 3 5.1 6 1
6: a 3 5.1 6 2
7: d 2 6.2 7 2
8: d 5 6.2 7 2
However when I add the extra condition (i.e.: the difference has to be less than 2
) as follows:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id),
by = .EACHI]
I do not get the expected result:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 6 2
10: d 2 6.2 7 1
11: d 2 6.2 7 2
12: d 5 6.2 7 1
13: d 5 6.2 7 2
Moreover, I get the following warning message:
Warning message: In
[.data.table
(dt_lu, dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < : Column 3 of result for group 1 is length 2 but the longest column in this result is 3. Recycled leaving remainder of 1 items. This warning is once only for the first group with this issue.
The expected result would be:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
11: d 2 6.2 7 2
13: d 5 6.2 7 2
I deliberately kept the rownumbers from the first example to show which rows have to be maintained in the final result (which is the same as the working solution).
As this answer shows, it should be possible to use multiple conditions inside the join operation.
I tried the following alternatives, but both of them don't work:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[(i.num < num) & (num - i.num < 2)],
fid = f_id),
by = .EACHI]
dt_lu[dt, {
val = num[(i.num < num) & (num - i.num < 2)];
list(tid = i.t_id,
tnum = i.num,
fnum = val,
fid = f_id)},
by = .EACHI]
Could someone explain me why I don't get the desired result with multiple conditions inside the join operation?