0

I am running this problem on SQL server

Here is my problem.

have something like this

Dataset A
FK_ID  StartDate  EndDate     Type
1      10/1/2018  11/30/2018  M
1      12/1/2018  2/28/2019   N
1      3/1/2019   10/31/2019  M

I have a second data source I have no control over with data something like this:

Dataset B
FK_ID  SpanStart  SpanEnd    Type
1      10/1/2018  10/15/2018 M
1      10/1/2018  10/25/2018 M
1      2/15/2019  4/30/2019  M
1      5/1/2019   10/31/2019 M

What I am trying to accomplish is to check to make sure every date within each TYPE M record in Dataset A has at least 1 record in Dataset B.

For example record 1 in Dataset A does NOT have coverage from 10/26/2018 through 11/30/2018. I really only care about when the coverage ends, in this case I want to return 10/26/2018 because it is the first date where the span has no coverage from Dataset B.

I've written a function that does this but it is pretty slow because it is cycling through each date within each M record and counting the number of records in Dataset B. It exits the loop when it finds the first one but I would really like to make this more efficient. I am sure I am not thinking about this properly so any suggestions anyone can offer would be helpful.

This is the section of code I'm currently running

        else if @SpanType = 'M'
            begin
                    set @CurrDate = @SpanStart
                    set @UncovDays = 0
                    while @CurrDate <= @SpanEnd
                        Begin
                            if (SELECT count(*) 
                                FROM eligiblecoverage ec join eligibilityplan ep on ec.plandescription = ep.planname 
                                WHERE ec.masterindividualid = @IndID
                                    and ec.planbegindate <= @CurrDate and ec.planenddate >= @CurrDate
                                    and ec.sourcecreateddate = @MaxDate
                                    and ep.medicaidcoverage = 1) = 0 
                                begin
                                    SET @Result = concat('NON Starting ',format(@currdate, 'M/d/yyyy'))
                                    BREAK
                                end
                            set @CurrDate = @CurrDate + 1
                        end
                    end

I am not married to having a function it just could not find a way to do this in queries that wasn't very very slow.

EDIT: Dataset B will never have any TYPEs except M so that is not a consideration

EDIT 2: The code offered by DonPablo does de-overlap the data but only in cases where there is an overlap at all. It reduces dataset B to:

FK_ID  SpanStart  SpanEnd  Type
1      10/1/2018  10/25/2018 M

instead of

FK_ID  SpanStart  SpanEnd   Type
1      10/1/2018  10/25/2018 M
1      2/15/2019  4/30/2019  M
1      5/1/2019   10/31/2019 M

I am still futzing around with it but it's a start.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
rfp
  • 11
  • 2
  • I notice that Dataset B has overlapping coverage -- row2 starts within span of row1. Is that intentional? – donPablo Oct 25 '19 at 23:54
  • Yes, the items in dataset B may overlap one another – rfp Oct 26 '19 at 00:09
  • But dataset B row3 has such a wide span that it does indeed cover set A row1. These sets might profit from having an autonum/identity ID for easy reference. – donPablo Oct 26 '19 at 00:19
  • The first thing is to DE-Overlap dataset B, that is to concatenate multiple rows (2,3,4,...) into one SPAN-Range. The second thing is to find set A rows that are fully in some De-overlapped B. Thirdly, some remaining set A rows will be part of no De-Overlapped B range. Lastly, the final set A rows will be partially covered either Start or End by a De-Overlapped B range. Else the final set of A rows will partially fall within multiple overlapped B rows. – donPablo Oct 26 '19 at 00:54
  • You are right on Row 3 of Dataset B, I have fixed it to be 2/15/2019 instead of 2/15/2018. These tables each do have identity columns but neither of the tables uses what I've labeled as the FK_ID. it's just the field I need to group records on because in actuality there may be 4 or 5 different rows all having the same date range in dataset B. – rfp Oct 26 '19 at 01:12
  • How would you propose to do your suggestion though. The primary problem is the dataset B data is all over the place and may or may not overlap with other records in the same dataset. – rfp Oct 26 '19 at 01:14
  • I changed the tags. `set @CurrDate = @SpanStart` is not valid SQL. – wildplasser Oct 26 '19 at 12:21

2 Answers2

0

Start by framing the problem in smaller pieces, in a sequence of actions like I did in the comment. See George Polya "How To Solve It" 1945

Then Google is your friend -- look at==> sql de-overlap date ranges into one record (over a million results) UPDATED--I picked Merge overlapping dates in SQL Server and updated it for our table and column names.

Also look at theory from 1983 Allen's Interval Algebra https://www.ics.uci.edu/~alspaugh/cls/shr/allen.html Or from 2014 https://stewashton.wordpress.com/2014/03/11/sql-for-date-ranges-gaps-and-overlaps/ This is a primer on how to setup test data for this problem.

Finally determine what counts via Ranking the various pairs of A vs B -- bypass those totally Within, then work with earliest PartialOverlaps, lastly do the Precede/Follow items.

--from Merge overlapping dates in SQL Server

with SpanStarts as 
( 
  select distinct FK_ID, SpanStart 
  from Coverage_B as t1 
  where not exists 
    (select * from Coverage_B as t2 
     where t2.FK_ID = t1.FK_ID 
       and t2.SpanStart < t1.SpanStart 
       and t2.SpanEnd >= t1.SpanStart) 
), 
SpanEnds as 
( 
  select distinct FK_ID, SpanEnd 
  from Coverage_B as t1 
  where not exists 
    (select * from Coverage_B as t2 
     where t2.FK_ID = t1.FK_ID 
       and t2.SpanEnd > t1.SpanEnd 
       and t2.SpanStart <= t1.SpanEnd) 
),
DeOverlapped_B as
(
Select FK_ID, SpanStart, 
  (select min(SpanEnd) from SpanEnds as e 
   where e.FK_ID = s.FK_ID 
     and SpanEnd >= SpanStart) as SpanEnd 
from SpanStarts as s
)

Select * from DeOverlapped_B

Now we have something to feed into the next steps, and we can use the above as a CTE

======================================

with SpanStarts as 
( 
  select distinct FK_ID, SpanStart 
  from Coverage_B as t1 
  where not exists 
    (select * from Coverage_B as t2 
     where t2.FK_ID = t1.FK_ID 
       and t2.SpanStart < t1.SpanStart 
       and t2.SpanEnd >= t1.SpanStart) 
), 
SpanEnds as 
( 
  select distinct FK_ID, SpanEnd 
  from Coverage_B as t1 
  where not exists 
    (select * from Coverage_B as t2 
     where t2.FK_ID = t1.FK_ID 
       and t2.SpanEnd > t1.SpanEnd 
       and t2.SpanStart <= t1.SpanEnd) 
),
DeOverlapped_B as
(
Select FK_ID, SpanStart, 
  (select min(SpanEnd) from SpanEnds as e 
   where e.FK_ID = s.FK_ID 
     and SpanEnd >= SpanStart) as SpanEnd 
from SpanStarts as s
),

-- find A row's coverage
 ACoverage as (
Select
    a.*, b.SpanEnd, b.SpanStart,
    Case 
    When SpanStart <= StartDate And  StartDate <= SpanEnd
    And  SpanStart <= EndDate   And  EndDate   <= SpanEnd
    Then '1within' -- starts, equals, during, finishes

    When                             EndDate   < SpanStart
    Or   SpanEnd   < StartDate   
    Then '3beforeAfter' -- preceeds, meets, preceeded, met 

    Else '2overlap'  -- one or two ends hang over spanStart/End
    End as relation

    From Coverage_A a
    Left Join DeOverlapped_B b
    On a.FK_ID = b.FK_ID

    Where a.Type = 'M'
)

Select
    *
    ,Case
    When relation1 = '2' And StartDate < SpanStart Then StartDate
    When relation1 = '2' Then DateAdd(d, 1, SpanEnd)
    When relation1 = '3' Then StartDate
    End as UnCoveredBeginning 
    From (
    Select 
        *
        ,SUBSTRING(relation,1,1) as relation1
        ,ROW_NUMBER() Over (Partition by A_ID Order by relation, SpanStart) as Rownum 
        from ACoverage  
    ) aRNO
    Where Rownum = 1 
    And   relation1 <> '1'
donPablo
  • 1,937
  • 1
  • 13
  • 18
  • I am still testing but this does appear to de-overlap, but only in cases where the FK_ID *has* overlapping spans. For all other spans it removes them (i.e. a span with no overlaps at all). I think I need to do more reading on the EXISTS functions in the join/where clauses. – rfp Oct 26 '19 at 05:16
  • I agree it has a flaw.. While this sql is a start, it does not perform correctly when I add more test cases. Back to the drawing board. – donPablo Oct 26 '19 at 17:12
  • @rfp I have reworked and revised the DeOverlap_B process so that it works now. And then added a step to join Coverage_A with DeOverlapped_B, and final step to compute UnCoveredBeginning date for those not covered. – donPablo Oct 27 '19 at 04:32
  • Thank you donPablo, I will have to try to implement this in the next couple of weeks. I will get back to you on final results. Thanks again. – rfp Oct 31 '19 at 03:37
0

I would approach this by focusing on B. My assumption is that any absent record would follow span_end in the table. So here is the idea:

  • Unpivot the dates in B (adding "1" to the end dates)
  • Add a flag if they are present with type "M".
  • Check to see if any not-present records are in the span for A.
  • Check the first and last dates as well.

So, this looks like:

with bdates as (
      select v.dte,
             (case when exists (select 1
                                from b b2
                                where v.dte between b2.spanstart and b2.spanend and
                                      b2.type = 'M'
                               )
                   then 1 else 0
              end) as in_b
      from b cross apply
           (values (spanstart), (dateadd(day, 1, spanend)
           ) v(dte)
      where b.type = 'M'  -- all we care about
      group by v.dte      -- no need for duplicates
     )
select a.*,
       (case when not exists (select 1
                              from b b2
                              where a.startdate between b2.spanstart and b2.spanend and
                                      b2.type = 'M'
                             )
             then 0
             when not exists (select 1
                              from b b2
                              where a.enddate between b2.spanstart and b2.spanend and
                                      b2.type = 'M'
                             )
             when exists (select 1
                          from bdates bd
                          where bd.dte between a.startdate and a.enddate and
                                bd.in_b = 0
                         )
             then 0
             when exists (select 1
                          from b b2
                          where a.startdate between b2.spanstart and b2.spanend and
                                b2.type = 'M'
                         )
             then 1
             else 0
       end)
from a;

What is this doing? Four validity checks:

  • Is the starttime valid?
  • Is the endtime valid?
  • Are any intermediate dates invalid?
  • Is there at least one valid record?
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786