0

I am attempting to extract the closest value based on date between 2 tables.

Table 1 looks like:

id value date
1 aa 10/2/21
2 ab 4/6/18

Table 2 looks like:

id value_2 date_2
1 ba 4/12/11
1 bb 8/16/08
1 bc 10/4/21
2 bd 4/26/18
2 bd 3/26/19

I would like the output table to return value_2 that corresponds to the same id in both tables based on the closest date. ex- id 1 should return value bc since 10/4/21 is closest to 10/2/21

So the output table should mutate this value_2 to look like-

id value date value_2
1 aa 10/2/21 bc
2 ab 4/6/18 bd
  • Something as simple as: `library(e1071); mod <- gknn(value ~ date, data = table2, k = 1); predict(mod, table1)` would probably do. – Oliver Feb 22 '22 at 11:28

1 Answers1

0

There's currently an error in one of the date formats. Further there's some unclarity on what is "closest" (do we only look back or do we also look forward?)

If we are only interested in truly "closest", we could simply use a k-nearest neighbour approach with a single neighbour (k = 1).

library(e1071)
table1 <- data.frame(value = factor(c('aa', 'ab')), 
                     date = as.Date(c('10/2/21', '4/6/18'), 
                                    tryFormats = c('%m/%d/%y', '%d/%m/%y')))
table2 <- data.frame(value = factor(c('ba', 'bb', 'bc', 'bd', 'bd')),
                     date = as.Date(c('4/12/11', '8/16/08', 
                                      # Chagned for example
                                      '2/15/21', 
                                      '4/26/18', '3/26/19'),
                                    tryFormats = c('%m/%d/%y', '%d/%m/%y')))
mod <- gknn(value ~ date, k = 1, data = table2)
table1$predict <- predict(mod, newdata = table1)
table1
#        value       date predict
#      1    aa 2021-10-02      bc
#      2    ab 2018-04-06      bd
Oliver
  • 8,169
  • 3
  • 15
  • 37
  • doesn't this need to be grouped by ```id``` though? – Aamash Haroon Feb 22 '22 at 12:13
  • The ```id``` here is pretty key. As I want the closest date for each ```id``` and return the corresponding ```value_2``` – Aamash Haroon Feb 22 '22 at 12:19
  • Indeed i missed that ID was important here (read over it). I'd suggest iterating over each ID and doing the above again. Either a loop, lapply, split-apply or group_by could achieve this with a bit of alteration of the code. – Oliver Feb 22 '22 at 12:46