SQL for Determining Overlapping Date Ranges

Question

I am working on an issue in SQL Developer where I need to calculate how many days someone was taking 2 different classes of drugs on the same day. The issue I'm having is that within the same drug class, there can be overlapping periods, but I don't really care about those, just overlaps between the different drug classes. Example of the 2 data sets I am working with below (this is not real patient information)

Table 1

ID	START	END	CLASS
1	10-FEB-21	17-FEB-21	A
1	15-FEB-21	21-FEB-21	A
1	11-MAR-21	21-MAR-21	A
1	19-APR-21	14-MAY-21	A
1	10-MAY-21	11-JUN-21	A

Table 2

ID	START	END	CLASS
1	16-JAN-21	28-FEB-21	B
1	15-MAR-21	14-APR-21	B
1	12-APR-21	12-MAY-21	B
1	09-MAY-21	14-MAY-21	B
1	07-JUN-21	01-JUL-21	B

So I need to know how many days, starting from 01-JAN-21, they are taking a Class A drug as well as a Class B drug. I am working in SQL Developer but if you have a solution using another language that would be interesting to see as well.

it's unclear what programming language do you aim with this question. Is it in SQL, Python or R? — Kardi Teknomo, Jul 08 '21 at 03:44
Ideally SQL. But if you have a solution in python or R I would use that — KGraves, Jul 08 '21 at 04:54
I don't know why this question would be closed a "not focused". It seems really, really clear to me and no doubt to the others who answered the question. — Gordon Linoff, Jul 08 '21 at 12:20
I agree with Gordon. I feel like the question is clear. Not totally sure how to edit it to be more specific. — KGraves, Jul 08 '21 at 16:08

score 1 · Answer 1 · answered Jul 08 '21 at 01:06

Probably the simplest way is to expand by days and join:

with cte_a (id, start, end, class) as (
      select id, start, end, class
      from table1
      where end >= date '2021-01-01'
      union all
      select id, start + interval '1' day, end, class
      from cte_a
      where start < end
     ),
     cte_b (id, start, end, class) as (
      select id, start, end, class
      from table1
      where end >= date '2021-01-01'
      union all
      select id, start + interval '1' day, end, class
      from cte_b
      where start < end
     )
select a.id, count(distinct start)
from cte_a a join
     cta_b b
     on a.id = b.id and
        a.start = b.start
group by id;

score 1 · Answer 2 · answered Jul 08 '21 at 01:23

Here is another option using data.table in R:

#summarize data into non-overlapping periods first (see reference 1)
A <- DTA[, g := c(0L, cumsum(shift(START, -1L) > cummax(as.integer(END)))[-.N]), ID][, 
    .(CLASS=CLASS[1L], START=min(START), END=max(END)), .(ID, g)]
B <- DTB[, g := c(0L, cumsum(shift(START, -1L) > cummax(as.integer(END)))[-.N]), ID][, 
    .(CLASS=CLASS[1L], START=min(START), END=max(END)), .(ID, g)]

#overlapping join and calculate the overlapping dates
setkey(B, ID, START, END)
ans <- foverlaps(A, B)[, dif := pmin(END, i.END) - pmax(START, i.START)]

#aggregate number of overlapping dates per ID
ans[, .(ndays_ovlap=sum(dif)), ID]

data:

library(data.table)
DTA <- fread("ID    START   END CLASS
1   10-FEB-21   17-FEB-21   A
1   15-FEB-21   21-FEB-21   A
1   11-MAR-21   21-MAR-21   A
1   19-APR-21   14-MAY-21   A
1   10-MAY-21   11-JUN-21   A")
cols <- c("START", "END")
DTA[, (cols) := lapply(.SD, as.IDate, format="%d-%b-%y"), .SDcols=cols]

DTB <- fread("ID    START   END CLASS
1   16-JAN-21   28-FEB-21   B
1   15-MAR-21   14-APR-21   B
1   12-APR-21   12-MAY-21   B
1   09-MAY-21   14-MAY-21   B
1   07-JUN-21   01-JUL-21   B")
DTB[, (cols) := lapply(.SD, as.IDate, format="%d-%b-%y"), .SDcols=cols]

Reference:

How to flatten / merge overlapping time periods

Adrian White · Answer 3 · 2021-07-08T05:23:24.270

Does this solve your problem ... simply join on ID and CLASS with predicates sorting out the dates?

select * , 
datediff(days,greatest(TABLE_1.START_DATE,TABLE_2.START_DATE ) 
,least( TABLE_1.END_DATE,TABLE_2.END_DATE )) overlapping_days 
from  TABLE_1 inner join TABLE_2 
    on TABLE_1.id = TABLE_2.id AND TABLE_1.CLASS <>  TABLE_2.CLASS
where      TABLE_1.START_DATE    between TABLE_2.START_DATE and TABLE_2.END_DATE  
or         TABLE_1.END_DATE      between TABLE_2.START_DATE and TABLE_2.END_DATE

Or am I missing something? (code to copy/paste into snowflake)

with TABLE_1 as ( 
select 1 ID,    TO_DATE('10-FEB-21','DD-MON-YY') START_DATE, TO_DATE('17-FEB-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('15-FEB-21','DD-MON-YY') START_DATE, TO_DATE('21-FEB-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('11-MAR-21','DD-MON-YY') START_DATE, TO_DATE('21-MAR-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('19-APR-21','DD-MON-YY') START_DATE, TO_DATE('14-MAY-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('10-MAY-21','DD-MON-YY') START_DATE, TO_DATE('11-JUN-21','DD-MON-YY') END_DATE, 'A' CLASS)
, TABLE_2 AS ( 
 SELECT 1 ID,   TO_DATE('16-JAN-21','DD-MON-YY') START_DATE,    TO_DATE('28-FEB-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('15-MAR-21','DD-MON-YY') START_DATE,    TO_DATE('14-APR-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('12-APR-21','DD-MON-YY') START_DATE,    TO_DATE('12-MAY-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('09-MAY-21','DD-MON-YY') START_DATE,    TO_DATE('14-MAY-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('07-JUN-21','DD-MON-YY') START_DATE,    TO_DATE('01-JUL-21','DD-MON-YY') END_DATE,  'B' CLASS)

select * , datediff(days, greatest(TABLE_1.START_DATE,TABLE_2.START_DATE ) , least( TABLE_1.END_DATE,TABLE_2.END_DATE )) overlapping_days from  TABLE_1 inner join TABLE_2 
    on TABLE_1.id = TABLE_2.id AND TABLE_1.CLASS <>  TABLE_2.CLASS
where      TABLE_1.START_DATE    between TABLE_2.START_DATE and TABLE_2.END_DATE  
or         TABLE_1.END_DATE      between TABLE_2.START_DATE and TABLE_2.END_DATE

SQL for Determining Overlapping Date Ranges

3 Answers3