Given 2 datasets with dates, how would I accurate assign a unit in one dataframe to an event in the other?
I tried the fuzzyjoin package, but have not been successful. The challenge is that ancillary events can overlap 2 anchor events and so they must be assigned based upon the start and end date.
library(tidyverse)
library(lubridate)
library(fuzzyjoin)
anchor_df <- tribble(
~person, ~anchor_beg, ~anchor_end,
'a' , '01-01-2020', '01-05-2020' ,
'a' , '01-17-2020', '01-18-2020' ,
'a' , '02-11-2020', '02-22-2020' ,
'b' , '04-01-2020', '04-07-2020'
)
ancillary_df <- tribble(
~person, ~anc_start , ~anc_end, ~units,
'a' , '01-07-2020', '01-11-2020' ,3,
'a' , '02-24-2020', '03-22-2020' , 15,
'b' , '04-08-2020', '06-07-2020', 25
)
anchor_df$anchor_beg <- mdy(anchor_df$anchor_beg)
anchor_df$anchor_end <- mdy(anchor_df$anchor_end)
ancillary_df$anc_start <- mdy(ancillary_df$anc_start)
ancillary_df$anc_end <- mdy(ancillary_df$anc_end)
fuzzy_left_join(
ancillary_df, anchor_df,
by = c(
"person" = "person",
"anc_start" = "anchor_end",
"anc_start" = "anchor_beg"
),
match_fun = list(`==`, `>=`, `<=`)
)
My desired output is:
I appreciate any pointers