3

Let's say I have 2 data tables with arbitrary positions and scores. All positions in each data table are unique. For example:

dt1:

position score
10       6.2
21       4.5
37       3.6

dt2:

position score
8        12.2
32       4.2
45       3.8
52       4.9

And I want to get position in df2 that is the closest to each row in df1. So for example, position 10 in df1 has df2 of position 8 as the closest. In the end the result will look like this:

position.dt1 score.dt1 position.dt2 score.dt2 distance
10           6.2       8            12.2      2
21           4.5       32           4.2       11
37           3.6       32           4.2       5

How to achieve this in R?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Algorithman
  • 1,309
  • 1
  • 16
  • 39
  • 1
    what if there were two positions in dt2 with the same distance? – M-- Sep 23 '19 at 21:38
  • all positions are unique, I updated the question – Algorithman Sep 23 '19 at 21:43
  • 3
    You can join with a `roll="nearest"` in *data.table* - take a look at the examples in `?data.table`. E.g. - https://stackoverflow.com/questions/44169249/r-data-table-roll-nearest-not-actually-nearest or https://stackoverflow.com/questions/47897193/rolling-join-by-in-data-table-r and https://stackoverflow.com/questions/54013468/merging-two-sets-of-data-by-data-table-roll-nearest-function – thelatemail Sep 23 '19 at 21:44

1 Answers1

2

In Base R

inds = sapply(dt1$position, function(x) which.min(abs(x - dt2$position)))
transform(dt1, pos.dt2 = dt2$position[inds],
          score.dt2 = dt2$score[inds],
          distance = abs(position - dt2$position[inds]))
#  position score pos.dt2 score.dt2 distance
#1       10   6.2       8      12.2        2
#2       21   4.5      32       4.2       11
#3       37   3.6      32       4.2        5
d.b
  • 32,245
  • 6
  • 36
  • 77