3

I have a table with the following columns : sID, start_date and end_date

Some of the values are as follows:

1   1995-07-28  2003-07-20 
1   2003-07-21  2010-05-04 
1   2010-05-03  2010-05-03 
2   1960-01-01  2011-03-01 
2   2011-03-02  2012-03-13 
2   2012-03-12  2012-10-21 
2   2012-10-22  2012-11-08 
3   2003-07-23  2010-05-02

I only want the 2nd and 3rd rows in my result as they are the overlapping date ranges.

I tried this but it would not get rid of the first row. Not sure where I am going wrong?

select a.sID from table a
inner join table b 
on a.sID = b.sID
and ((b.start_date between a.start_date and a.end_date)
and (b.end_date between a.start_date and b.end_date ))
order by end_date desc

I am trying to do in SQL Server

koala
  • 91
  • 1
  • 2
  • 6

2 Answers2

7

One way of doing this reasonably efficiently is

WITH T1
     AS (SELECT *,
                MAX(end_date) OVER (PARTITION BY sID ORDER BY start_date) AS max_end_date_so_far
         FROM   YourTable),
     T2
     AS (SELECT *,
                range_start = IIF(start_date <= LAG(max_end_date_so_far) OVER (PARTITION BY sID ORDER BY start_date), 0, 1),
                next_range_start = IIF(LEAD(start_date) OVER (PARTITION BY sID ORDER BY start_date) <= max_end_date_so_far, 0, 1)
         FROM   T1)
SELECT SId,
       start_date,
       end_date
FROM   T2
WHERE  0 IN ( range_start, next_range_start ); 

if you have an index on (sID, start_date) INCLUDE (end_date) this can perform the work with a single ordered scan.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
6

Your logic is not totally correct, although it almost works on your sample data. The specific reason it fails is because between includes the end points, so any given row matches itself. That said, the logic still isn't correct because it doesn't catch this situation:

 a-------------a
      b----b

Here is correct logic:

select a.*
from table a
where exists (select 1
              from table b
              where a.sid = b.sid and
                    a.start_date < b.end_date and
                    a.end_date > b.start_date and
                    (a.start_date <> b.start_date or  -- filter out the record itself
                     a.end_date <> b.end_date
                    )
             )
order by a.end_date;

The rule for overlapping time periods (or ranges of any sort) is that period 1 overlaps with period 2 when period 1 starts before period 2 ends and period 1 ends after period 2 starts. Happily, there is no need or use for between for this purpose. (I strongly discourage using between with date/time operands.)

I should note that this version does not consider two time periods to overlap when one ends on the same day another begins. That is easily adjusted by changing the < and > to <= and >=.

Here is a SQL Fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • My table has many other sIDs. I tried running your query and filtered it on where sID=1 and it still giving me the same result. – koala Apr 14 '18 at 22:03
  • @koala . . . Right. You don't have a unique id identifying each row, so normally there would be `and a.id <> b.id`. I added similar logic based on the start and end dates. – Gordon Linoff Apr 14 '18 at 22:13
  • Now when I don't filter on sID=1, it only gives me those 2 rows and not any overlapping date ranges for any other sID – koala Apr 14 '18 at 22:16
  • @koala . . . Nothing in the answer filters to `sID = 1`. You must have something else wrong with your query. I included a SQL Fiddle to show that it works. – Gordon Linoff Apr 14 '18 at 22:23
  • You're right, works now! sorry about that and thanks! Can you please explain me the logic behind "a.start_date <> b.start_date or a.end_date <> b.end_date" – koala Apr 14 '18 at 22:38
  • Also, based on the dates in question now, the query won't give 2003-07-23 2010-05-02 – koala Apr 14 '18 at 22:52
  • @koala . . . If you want overlapping ranges ignoring `sid`, then remove that condition from the subquery. Based on your original question, the assumption that you want them within an `sid` is very reasonable, especially because you have that in *your* query. Otherwise, the stated row doesn't overlap with anything else. – Gordon Linoff Apr 14 '18 at 22:54