I have two DF
and I would like to group them together, however I have to check if the data is within a date range. My first DF:
ID <- c(111,222,333,444,555,666)
DT_START_HOSP <- as.Date(c('2021/01/07','2021/01/11','2021/01/21','2021/01/21','2021/01/21','2021/01/22'))
DT_END_HOSP <- as.Date(c('2021/01/10','2021/01/20','2021/01/25','2021/02/01','2021/01/29','2021/02/02'))
HOSP <- data.frame(ID,DT_START_HOSP,DT_END_HOSP)
My second DF
:
ID <- c(1010,1010,1010,222,222,5050,5050,666,666)
DT_START_OUT <- as.Date(c('2021/01/01','2021/01/11','2021/01/30','2021/01/02','2021/01/15','2021/03/15','2021/04/20','2021/01/25','2021/01/28'))
DT_END_OUT <- as.Date(c('2021/01/01','2021/01/11','2021/01/30','2021/01/02','2021/01/15','2021/03/20','2021/04/20','2021/01/25','2021/01/30'))
OUT <- data.frame(ID,DT_START_OUT,DT_END_OUT)
I want to select only the “ID's” that are within the range of the columns (DT_START_HOSP and DT_END_HOSP)
compared to the columns (DT_START_OUT and DT_END_OUT)
. Thus, the result I expect is:
ID <- c(111,222,333,444,555,666,666)
DT_START_HOSP <- as.Date(c('2021/01/07','2021/01/11','2021/01/21','2021/01/21','2021/01/21',' 2021/01/22','2021/01/22'))
DT_END_HOSP <- as.Date(c('2021/01/10','2021/01/20','2021/01/25','2021/02/01','2021/01/29',' 2021/02/02','2021/02/02'))
DT_START_OUT <- as.Date(c('','2021/01/15','','','','2021/01/25','2021/01/28'))
DT_END_OUT <- as.Date(c('','2021/01/15','','','','2021/01/25','2021/01/30'))
HOSP <- data.frame(ID,DT_START_HOSP,DT_END_HOSP,DT_START_OUT,DT_END_OUT)
However, I used this code (Join tables by date range), but it didn't work. See the result
library(sqldf)
RESULT <- sqldf("select * from HOSP
left join OUT
on HOSP.DT_START_HOSP between OUT.DT_START_OUT and OUT.DT_END_OUT")
Is it possible to perform this join respecting a date range?