-2

I am trying to perform a left join on two dataframes based on nearest timestamp. The sample data looks like this:

> df1
       ID      date1
1      1     2020-07-11 19:14:23
2      1     2020-07-21 13:11:10
3      1     2020-07-21 18:07:25
4      1     2020-07-28 18:18:11
5      2     2020-07-13 16:47:26
6      2     2020-07-18 17:11:37
7      3     2020-07-23 10:39:19

> df2
       ID      date2                 Flag
1      1     2020-07-11 18:14:23     Yes
2      1     2020-07-20 14:21:11     Yes
3      2     2020-07-13 17:18:13     Yes
4      2     2020-07-18 15:12:06     Yes

I want to merge the two data frames by ID and dates columns so that the Flag column can be joined in df1 to the nearest date. Result is to look like this

> Combined
       ID      date1                 Flag
1      1     2020-07-11 19:14:23     Yes
2      1     2020-07-21 13:11:10     Yes
3      1     2020-07-21 18:07:25     
4      1     2020-07-28 18:18:11    
5      2     2020-07-13 16:47:26     Yes
6      2     2020-07-18 17:11:37     Yes
7      3     2020-07-23 10:39:19

Couldn't find an apt solution. Please help.

Looper
  • 295
  • 2
  • 3
  • 10
  • Did you see [this](https://stackoverflow.com/questions/15712826/join-r-data-tables-where-key-values-are-not-exactly-equal-combine-rows-with-clo) or [this](https://stackoverflow.com/questions/31818444/join-two-data-frames-in-r-based-on-closest-timestamp)? – David Arenburg Aug 26 '20 at 13:13

2 Answers2

2

Something like this might work.

library(data.table)
setDT(df2)[, join_date := date2]
setDT(df1)[, join_date := date1]

# rolling join
df<-df1[df2, on = .(ID, join_date), roll = "nearest"]
Tanner33
  • 120
  • 2
  • 15
  • getting this error ``` Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, : roll='nearest' can't be applied to a character column, yet. ``` – Looper Aug 26 '20 at 13:28
  • You need to convert your columns to proper classes – David Arenburg Aug 26 '20 at 13:37
  • I have converted them into proper classes. columns 'ID' and 'Flag' are the character columns and date1 and date 2 are in date format only – Looper Aug 26 '20 at 17:02
  • 1
    Data.table rolls on last key column. In the answer it is set to ID, when it needs to be join_date. So the last line should be `df<-df1[df2, on = .(ID, join_date), roll = "nearest"]` – Brian Fisher Oct 28 '21 at 18:46
  • Thanks @BrianFisher. I made this edit to the post. – Tanner33 Oct 29 '21 at 15:47
1

Here is a roundabout way using dplyr

library(dplyr)
df1 %>% 
  left_join(df2 %>%
    left_join(df1) %>%
    mutate(date_diff = abs(date2 - date1)) %>%
    group_by(ID, date2) %>%
    filter(date_diff == min(date_diff)) %>%
    ungroup() %>%
    select(-date2, -date_diff) ) %>% 
  mutate(Flag = case_when(is.na(Flag) ~ "No",
                          TRUE ~ Flag))
Joining, by = "ID"
Joining, by = c("ID", "date1")
# A tibble: 7 x 3
     ID date1               Flag 
  <dbl> <dttm>              <chr>
1     1 2020-07-11 19:14:23 Yes  
2     1 2020-07-21 13:11:10 Yes  
3     1 2020-07-21 18:07:25 No   
4     1 2020-07-28 18:18:11 No   
5     2 2020-07-13 16:47:26 Yes  
6     2 2020-07-18 17:11:37 Yes  
7     3 2020-07-23 10:39:19 No 
Ben Norris
  • 5,639
  • 2
  • 6
  • 15