Combine consecutive date ranges

Question

Using SQL Server 2008 R2,

I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.

The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).

An employment period that has not ended has an enddate that is null.

Some examples is probably enlightening:

declare @t as table  (employmentid int, startdate datetime, enddate datetime)

insert into @t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)

-- expected outcome
EmploymentId StartDate   EndDate
5            2007-12-03  2011-08-26
5            2013-05-02  NULL
30           2006-10-02  NULL
66           2007-09-24  NULL

I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.

Shouldn't `startDate == endDate` for a proper overlap? Otherwise there will be 24 hours that are unaccounted for. — MaxH, Apr 03 '13 at 09:21
this will be the stored procedure, yes? or your restricted by query? — orrollo, Apr 03 '13 at 09:37
@MaxH: Actually, the datetimes are used as dates. So the overlap is ok. — Jonas Lincoln, Apr 03 '13 at 11:39
@JonasLincoln: Yes, I understand that, but if you were to calculate the number of days an employee has been employed, you would get different results. In the example above, employmentid 30 has worked 1567 + 573 + 234 = 2374 days (with null = 2013-04-04 = today). This is not the same as the summary for employmentid 30 (2376 days from 2006-10-02 to 2013-04-04). You will be 1 day short for each change of employment type. — MaxH, Apr 04 '13 at 12:17

score 16 · Accepted Answer · answered Apr 03 '13 at 09:32

The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.

To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

create table Tbl (
  employmentid int,
  startdate datetime,
  enddate datetime);

insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);

/*
-- expected outcome
EmploymentId StartDate   EndDate
5            2007-12-03  2011-08-26
5            2013-05-02  NULL
30           2006-10-02  NULL
66           2007-09-24  NULL
*/

Query 1:

;with cte as (
   select a.employmentid, a.startdate, a.enddate
     from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
    where b.employmentid is null
    union all
   select a.employmentid, a.startdate, b.enddate
     from cte a
     join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
   select employmentid,
          startdate,
          nullif(max(isnull(enddate,'32121231')),'32121231') enddate
     from cte
 group by employmentid, startdate
 order by employmentid

Results:

| EMPLOYMENTID |                        STARTDATE |                       ENDDATE |
-----------------------------------------------------------------------------------
|            5 |  December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
|            5 |       May, 02 2013 00:00:00+0000 |                        (null) |
|           30 |   October, 02 2006 00:00:00+0000 |                        (null) |
|           66 | September, 24 2007 00:00:00+0000 |                        (null) |

Six years later and this is still a fantastic solution for small date groups. Thank you! — Joe S, May 24 '19 at 18:38
shouldn't first projection in cte should be `;with cte as ( select a.employmentid, b.startdate, a.enddate` . b.startdate instead of a.startdate? — nir, Mar 11 '21 at 18:42

score 2 · Answer 2 · edited Mar 16 '18 at 04:51

2

SET NOCOUNT ON

DECLARE @T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)

INSERT INTO @T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'


SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
       s1.FromDate, 
       MIN(t1.ToDate) AS ToDate 
FROM @T s1 
INNER JOIN @T t1 ON s1.FromDate <= t1.ToDate 
  AND NOT EXISTS(SELECT * FROM @T t2 
                 WHERE t1.ToDate >= t2.FromDate
                   AND t1.ToDate < t2.ToDate) 
WHERE NOT EXISTS(SELECT * FROM @T s2 
                 WHERE s1.FromDate > s2.FromDate
                   AND s1.FromDate <= s2.ToDate) 
GROUP BY s1.FromDate 
ORDER BY s1.FromDate

edited Mar 16 '18 at 04:51

Ubercool

1,029
2
14
29

answered Mar 16 '18 at 04:32

user9501037

21
1

3

Instead of providing plain code, try to explain the thought process to benefit all looking for answer. – Ubercool Mar 16 '18 at 04:44
1

It looks like the logic is this: after all ranges are merged, the first range in a group of merged ranges has a start date not in any other range, and the last range in a group of has an end date not in other range. The query finds all first ranges (`s1`) and finds the corresponding last range (`MIN(t1.ToDate)` corresponds to the earliest last range ending after `s1`). The EXISTS conditions limits s1 to first ranges and t1 to last ranges. – Victor Feb 05 '20 at 23:03

score 2 · Answer 3 · answered Jun 19 '20 at 21:27

An alternative solution that uses window functions rather than recursive CTEs

SELECT 
    employmentid, 
    MIN(startdate) as startdate, 
    NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
    SELECT 
        employmentid, 
        startdate, 
        enddate,
        DATEADD(
            DAY, 
            -COALESCE(
                SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 
                0
            ),
            startdate
    ) as grp
    FROM @t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate

This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:

Determine totals days the span occupies (+1 as the dates are inclusive)

SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t

Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
- We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
- We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)

SELECT *, COALESCE(
    SUM(daysSpanned) OVER (
        PARTITION BY employmentid 
        ORDER BY startdate 
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    )
    ,0
)  as cumulativeDaysSpanned
FROM (
    SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1

Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
- If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
- If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
- Although grp is a date, the date itself is meaningless we are using just as a grouping value

SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
    SELECT *, COALESCE(
        SUM(daysSpanned) OVER (
            PARTITION BY employmentid 
            ORDER BY startdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        )
        ,0
    )  as cumulativeDaysSpanned
    FROM (
        SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
    ) inner1
) inner2

With the results

+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate               | enddate                 | daysSpanned | cumulativeDaysSpanned | grp                     |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363        | 0                     | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5            | 2013-05-02 00:00:00.000 | NULL                    | NULL        | 1363                  | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568        | 0                     | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574         | 1568                  | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30           | 2012-08-13 00:00:00.000 | NULL                    | NULL        | 2142                  | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66           | 2007-09-24 00:00:00.000 | NULL                    | NULL        | 0                     | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+

Finally we can GROUP BY grp to get the get rid of the consecutive days.
- Use MIN and MAX to get the new startdate and endate
- To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again

SELECT 
    employmentid, 
    MIN(startdate) as startdate, 
    NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
    SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
    FROM (
        SELECT *, COALESCE(
            SUM(daysSpanned) OVER (
                PARTITION BY employmentid 
                ORDER BY startdate 
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            )
            ,0
        )  as cumulativeDaysSpanned
        FROM (
            SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
        ) inner1
    ) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate

To get the desired result

+--------------+-------------------------+-------------------------+
| employmentid | startdate               | enddate                 |
+--------------+-------------------------+-------------------------+
| 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5            | 2013-05-02 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+
| 30           | 2006-10-02 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+
| 66           | 2007-09-24 00:00:00.000 | NULL                    |
+--------------+-------------------------+-------------------------+

We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable

Limitations of all this required that

there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed

score 1 · Answer 4 · answered Nov 16 '17 at 10:31

A modified script for combining all overlapping periods.
For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015

will give one period:
01.01.2001-05.05.2015

tbl.enddate must be completed

;WITH cte
  AS(
SELECT
  a.employmentid
  ,a.startdate
  ,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
    and a.startdate > c.startdate
    and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL

UNION all

SELECT
  a.employmentid
  ,a.startdate
  ,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
    and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
          startdate,
          nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate

Combine consecutive date ranges

4 Answers4

Linked