2

I am working on an issue in SQL Developer where I need to calculate how many days someone was taking 2 different classes of drugs on the same day. The issue I'm having is that within the same drug class, there can be overlapping periods, but I don't really care about those, just overlaps between the different drug classes. Example of the 2 data sets I am working with below (this is not real patient information)

Table 1

ID START END CLASS
1 10-FEB-21 17-FEB-21 A
1 15-FEB-21 21-FEB-21 A
1 11-MAR-21 21-MAR-21 A
1 19-APR-21 14-MAY-21 A
1 10-MAY-21 11-JUN-21 A

Table 2

ID START END CLASS
1 16-JAN-21 28-FEB-21 B
1 15-MAR-21 14-APR-21 B
1 12-APR-21 12-MAY-21 B
1 09-MAY-21 14-MAY-21 B
1 07-JUN-21 01-JUL-21 B

So I need to know how many days, starting from 01-JAN-21, they are taking a Class A drug as well as a Class B drug. I am working in SQL Developer but if you have a solution using another language that would be interesting to see as well.

KGraves
  • 21
  • 2

3 Answers3

1

Probably the simplest way is to expand by days and join:

with cte_a (id, start, end, class) as (
      select id, start, end, class
      from table1
      where end >= date '2021-01-01'
      union all
      select id, start + interval '1' day, end, class
      from cte_a
      where start < end
     ),
     cte_b (id, start, end, class) as (
      select id, start, end, class
      from table1
      where end >= date '2021-01-01'
      union all
      select id, start + interval '1' day, end, class
      from cte_b
      where start < end
     )
select a.id, count(distinct start)
from cte_a a join
     cta_b b
     on a.id = b.id and
        a.start = b.start
group by id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Here is another option using data.table in R:

#summarize data into non-overlapping periods first (see reference 1)
A <- DTA[, g := c(0L, cumsum(shift(START, -1L) > cummax(as.integer(END)))[-.N]), ID][, 
    .(CLASS=CLASS[1L], START=min(START), END=max(END)), .(ID, g)]
B <- DTB[, g := c(0L, cumsum(shift(START, -1L) > cummax(as.integer(END)))[-.N]), ID][, 
    .(CLASS=CLASS[1L], START=min(START), END=max(END)), .(ID, g)]

#overlapping join and calculate the overlapping dates
setkey(B, ID, START, END)
ans <- foverlaps(A, B)[, dif := pmin(END, i.END) - pmax(START, i.START)]

#aggregate number of overlapping dates per ID
ans[, .(ndays_ovlap=sum(dif)), ID]

data:

library(data.table)
DTA <- fread("ID    START   END CLASS
1   10-FEB-21   17-FEB-21   A
1   15-FEB-21   21-FEB-21   A
1   11-MAR-21   21-MAR-21   A
1   19-APR-21   14-MAY-21   A
1   10-MAY-21   11-JUN-21   A")
cols <- c("START", "END")
DTA[, (cols) := lapply(.SD, as.IDate, format="%d-%b-%y"), .SDcols=cols]

DTB <- fread("ID    START   END CLASS
1   16-JAN-21   28-FEB-21   B
1   15-MAR-21   14-APR-21   B
1   12-APR-21   12-MAY-21   B
1   09-MAY-21   14-MAY-21   B
1   07-JUN-21   01-JUL-21   B")
DTB[, (cols) := lapply(.SD, as.IDate, format="%d-%b-%y"), .SDcols=cols]

Reference:

  1. How to flatten / merge overlapping time periods
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
0

Does this solve your problem ... simply join on ID and CLASS with predicates sorting out the dates?

select * , 
datediff(days,greatest(TABLE_1.START_DATE,TABLE_2.START_DATE ) 
,least( TABLE_1.END_DATE,TABLE_2.END_DATE )) overlapping_days 
from  TABLE_1 inner join TABLE_2 
    on TABLE_1.id = TABLE_2.id AND TABLE_1.CLASS <>  TABLE_2.CLASS
where      TABLE_1.START_DATE    between TABLE_2.START_DATE and TABLE_2.END_DATE  
or         TABLE_1.END_DATE      between TABLE_2.START_DATE and TABLE_2.END_DATE  

enter image description here

Or am I missing something? (code to copy/paste into snowflake)

with TABLE_1 as ( 
select 1 ID,    TO_DATE('10-FEB-21','DD-MON-YY') START_DATE, TO_DATE('17-FEB-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('15-FEB-21','DD-MON-YY') START_DATE, TO_DATE('21-FEB-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('11-MAR-21','DD-MON-YY') START_DATE, TO_DATE('21-MAR-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('19-APR-21','DD-MON-YY') START_DATE, TO_DATE('14-MAY-21','DD-MON-YY') END_DATE, 'A' CLASS
union select 1 ID,  TO_DATE('10-MAY-21','DD-MON-YY') START_DATE, TO_DATE('11-JUN-21','DD-MON-YY') END_DATE, 'A' CLASS)
, TABLE_2 AS ( 
 SELECT 1 ID,   TO_DATE('16-JAN-21','DD-MON-YY') START_DATE,    TO_DATE('28-FEB-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('15-MAR-21','DD-MON-YY') START_DATE,    TO_DATE('14-APR-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('12-APR-21','DD-MON-YY') START_DATE,    TO_DATE('12-MAY-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('09-MAY-21','DD-MON-YY') START_DATE,    TO_DATE('14-MAY-21','DD-MON-YY') END_DATE,  'B' CLASS
UNION SELECT 1 ID,  TO_DATE('07-JUN-21','DD-MON-YY') START_DATE,    TO_DATE('01-JUL-21','DD-MON-YY') END_DATE,  'B' CLASS)

select * , datediff(days, greatest(TABLE_1.START_DATE,TABLE_2.START_DATE ) , least( TABLE_1.END_DATE,TABLE_2.END_DATE )) overlapping_days from  TABLE_1 inner join TABLE_2 
    on TABLE_1.id = TABLE_2.id AND TABLE_1.CLASS <>  TABLE_2.CLASS
where      TABLE_1.START_DATE    between TABLE_2.START_DATE and TABLE_2.END_DATE  
or         TABLE_1.END_DATE      between TABLE_2.START_DATE and TABLE_2.END_DATE  
Adrian White
  • 1,720
  • 12
  • 14