0

I have two DF and I would like to group them together, however I have to check if the data is within a date range. My first DF:

ID <- c(111,222,333,444,555,666)
DT_START_HOSP <- as.Date(c('2021/01/07','2021/01/11','2021/01/21','2021/01/21','2021/01/21','2021/01/22'))
DT_END_HOSP <- as.Date(c('2021/01/10','2021/01/20','2021/01/25','2021/02/01','2021/01/29','2021/02/02'))
HOSP <- data.frame(ID,DT_START_HOSP,DT_END_HOSP)

My second DF:

ID <- c(1010,1010,1010,222,222,5050,5050,666,666)
DT_START_OUT <- as.Date(c('2021/01/01','2021/01/11','2021/01/30','2021/01/02','2021/01/15','2021/03/15','2021/04/20','2021/01/25','2021/01/28'))
DT_END_OUT <- as.Date(c('2021/01/01','2021/01/11','2021/01/30','2021/01/02','2021/01/15','2021/03/20','2021/04/20','2021/01/25','2021/01/30'))
OUT <- data.frame(ID,DT_START_OUT,DT_END_OUT)

I want to select only the “ID's” that are within the range of the columns (DT_START_HOSP and DT_END_HOSP) compared to the columns (DT_START_OUT and DT_END_OUT). Thus, the result I expect is:

ID <- c(111,222,333,444,555,666,666)
DT_START_HOSP <- as.Date(c('2021/01/07','2021/01/11','2021/01/21','2021/01/21','2021/01/21',' 2021/01/22','2021/01/22'))
DT_END_HOSP <- as.Date(c('2021/01/10','2021/01/20','2021/01/25','2021/02/01','2021/01/29',' 2021/02/02','2021/02/02'))
DT_START_OUT <- as.Date(c('','2021/01/15','','','','2021/01/25','2021/01/28'))
DT_END_OUT <- as.Date(c('','2021/01/15','','','','2021/01/25','2021/01/30'))
HOSP <- data.frame(ID,DT_START_HOSP,DT_END_HOSP,DT_START_OUT,DT_END_OUT)

However, I used this code (Join tables by date range), but it didn't work. See the result

library(sqldf)
RESULT <- sqldf("select * from HOSP
                left join OUT
                on HOSP.DT_START_HOSP between OUT.DT_START_OUT and OUT.DT_END_OUT")

enter image description here

Is it possible to perform this join respecting a date range?

user438383
  • 5,716
  • 8
  • 28
  • 43
Bruno Avila
  • 296
  • 2
  • 10
  • 1
    Does this answer your question? [Merge 2 dataframes if value within range](https://stackoverflow.com/questions/23934361/merge-2-dataframes-if-value-within-range) – Dylan_Gomes Oct 27 '21 at 18:35

1 Answers1

2

Your logic is not as simple as "between", since it appears that you want any kind of overlap, whether a superset or otherwise. For that, we need a slightly different query (and should include ID on the left-join as well, I'm inferring).

sqldf::sqldf("
  select h.*, o.DT_START_OUT, o.DT_END_OUT
  from HOSP h
    left join OUT o on h.ID = o.ID
      and h.DT_START_HOSP < o.DT_END_OUT
      and h.DT_END_HOSP > o.DT_START_OUT")
#    ID DT_START_HOSP DT_END_HOSP DT_START_OUT DT_END_OUT
# 1 111    2021-01-07  2021-01-10         <NA>       <NA>
# 2 222    2021-01-11  2021-01-20   2021-01-15 2021-01-15
# 3 333    2021-01-21  2021-01-25         <NA>       <NA>
# 4 444    2021-01-21  2021-02-01         <NA>       <NA>
# 5 555    2021-01-21  2021-01-29         <NA>       <NA>
# 6 666    2021-01-22  2021-02-02   2021-01-25 2021-01-25
# 7 666    2021-01-22  2021-02-02   2021-01-28 2021-01-30

(Thank you for fixing your data from the previous question and the first draft of this one. For the record, you may want this, some handy code that deals well with vectors of dates in inconsistent/different formats.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks for the contribution. I really had done something wrong. I'm building code to try to identify scams and I believe this will help me. – Bruno Avila Oct 28 '21 at 14:11