I am trying to merge two data.tables that result in multiple matches, to resolve this I want to select at random one of the multiple matches.
Say i have dt1
as such. Note, dt1
has 100+ columns but for demonstration purposes I've only written out three.
dt1 = data.table(x1 = c(100, 100, 200, 200, 200),
x2 = c(101, 102, 201, 202, 203),
v1 = c(1,2,3,4,5),
...
vn = ...)
x1 x2 v1
1: 100 101 1
2: 100 102 2
3: 200 201 3
4: 200 202 4
5: 200 203 5
Similarly dt2
looks like such. It also has more columns than what is shown below:
dt2 = data.table(x1 = c(100, 100, 100, 200, 200),
y1 = c(1, 2, 3, 4, 5),
...)
x1 y1
1: 100 1
2: 100 2
3: 100 3
4: 200 4
5: 200 5
What I want to do is merge on dt1
onto dt2
using column x1
to result in something like this:
x1 x2 v1 y1
1: 100 101 1 1
2: 100 101 1 2
3: 100 102 2 3
4: 200 201 3 4
5: 200 203 5 5
This solution is very close to what I want
Join data frames and select random row when there are multiple matches
however, since both dt1
and dt2
has so many columns I cannot list out every single column (regardless of which way I want to merge dt1 to dt2 or dt2 to dt1).