Combine continuous datetime intervals by type

Question

Say we have such a table:

declare @periods table (
    s date, 
    e date,
    t tinyint
);

with date intervals without gaps ordered by start date (s)

insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

All date intervals have different types (t).

It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).

So the result table should look like:

      s     |      e     |  t
------------|------------|-----
 2013-01-01 | 2013-01-02 |  3
 2013-01-02 | 2013-01-05 |  1
 2013-01-05 | 2013-01-08 |  2
 2013-01-08 | 2013-01-09 |  1

Any ideas how to do this without cursor?

I've got one working solution:

declare @periods table (
    s datetime primary key clustered, 
    e datetime,
    t tinyint,
    period_number int   
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

declare @t tinyint = null;  
declare @PeriodNumber int = 0;
declare @anchor date;

update @periods
    set  period_number = @PeriodNumber, 
    @PeriodNumber = case
                        when @t <> t
                            then  @PeriodNumber + 1
                        else
                            @PeriodNumber
                    end,
    @t = t,
    @anchor = s
option (maxdop 1);

select 
    s = min(s),
    e = max(e),
    t = min(t)
from 
    @periods    
group by 
    period_number
order by 
    s;

but I doubt if I can rely on such a behavior of UPDATE statement?

I use SQL Server 2008 R2.

Edit:

Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/

I found three important things that were missed in the solution above:

There must be clustered index on the table
There must be anchor variable and call of the clustered column
Update statement should be executed by one processor, i.e. without parallelism

I've changed the above solution in accordance with these rules.

this solution realy works...? if you cutoff the register ('2013-01-06' , '2013-01-07', 2) will return same result.. and didnt you would spect a line for ('2013-01-05' , '2013-01-06', 2) and other for ('2013-01-07' , '2013-01-08', 2) cause they are broken a interval? — Frederic, Feb 19 '13 at 18:14
To Martin Smith: I use SQL Server 2008 R2. To Frederic: Sorry, I didn't quite catch you. Yes, the above solution works, I checked it before posting. One of the condition for the task is that all the intervals are continuous or successive, following one after another without gaps. — Vadim Loboda, Feb 19 '13 at 18:49
You can use a "pseudo-cursor". See this http://www.sqlservercentral.com/articles/T-SQL/68467/ and look for the "Quirky Update" Running Total. But you'll have to use a temp table, not a table variable. It's a technique I've used a lot and it's working great every time. — Daniel, Feb 19 '13 at 18:53
Daniel, you directed me to the right article. It was the last link in the chain. Thank you!!! — Vadim Loboda, Feb 19 '13 at 20:09
By the way the article shows the use of both: table variable and temp table. What is the restriction about table variable for "Quirky Update" technique? — Vadim Loboda, Feb 19 '13 at 20:23
As a matter of fact, your solution does contain iteration (in `cte1`). Howsoever, please consider removing your solution from the question and re-posting it as your own answer instead. — Andriy M, Feb 20 '13 at 07:08
Andriy, let me not agree with you about iteration - cte1 is just a UNION and not a recursive CTE. Thank you for your advice, I posted my solution as my own answer. — Vadim Loboda, Feb 20 '13 at 07:21

score 5 · Accepted Answer · answered Feb 24 '13 at 21:44

Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t value, you could group all the rows using that criterion, then just take MIN(s), MAX(e) for every group.

One method of obtaining such a criterion is to use two ROW_NUMBER calls. Consider the following query:

SELECT
  *,
  rnk1 = ROW_NUMBER() OVER (               ORDER BY s),
  rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;

For your example it would return the following set:

s           e           t   rnk1  rnk2
----------  ----------  --  ----  ----
2013-01-01  2013-01-02  3   1     1
2013-01-02  2013-01-04  1   2     1
2013-01-04  2013-01-05  1   3     2
2013-01-05  2013-01-06  2   4     1
2013-01-06  2013-01-07  2   5     2
2013-01-07  2013-01-08  2   6     3
2013-01-08  2013-01-09  1   7     3

The interesting thing about the rnk1 and rnk2 rankings is that if you subtract one from the other, you will get values that, together with t, uniquely identify every distinct sequence of rows with the same t:

s           e           t   rnk1  rnk2  rnk1 - rnk2
----------  ----------  --  ----  ----  -----------
2013-01-01  2013-01-02  3   1     1     0
2013-01-02  2013-01-04  1   2     1     1
2013-01-04  2013-01-05  1   3     2     1
2013-01-05  2013-01-06  2   4     1     3
2013-01-06  2013-01-07  2   5     2     3
2013-01-07  2013-01-08  2   6     3     3
2013-01-08  2013-01-09  1   7     3     4

Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:

WITH partitioned AS (
  SELECT
    *,
    g = ROW_NUMBER() OVER (               ORDER BY s)
      - ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
  FROM @periods
)
SELECT
  s = MIN(s),
  e = MAX(e),
  t
FROM partitioned
GROUP BY
  t,
  g
;

If you like, you can play with this solution at SQL Fiddle.

Yes! This is the solution I was looking for! Thank you Andriy! — Vadim Loboda, Feb 25 '13 at 07:14

score 2 · Answer 2 · answered Feb 19 '13 at 18:54

2

How about this?

declare @periods table (
    s datetime primary key, 
    e datetime,
    t tinyint,
    s2 datetime
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

update @periods set s2 = s;

while @@ROWCOUNT > 0
begin
    update p2 SET s2=p1.s
    from @periods p1
    join @PERIODS P2 ON p2.t = p1.t AND p2.s2 = p1.e;
end

select s2 as s, max(e) as e, min(t) as t
from @periods
group by s2
order by s2;

answered Feb 19 '13 at 18:54

GilM

3,711
17
18

GilM, thank you. Interesting and nontrivial solution. I would never think that way. But.. update statement in this example executes 4 times. While this is not the cursor, I think it can be done without iterations at all. – Vadim Loboda Feb 19 '13 at 19:23
I suspect some kind of iteration will be necessary, even if it's hidden within a recursive CTE or something. Your solution depends on no gaps in the intervals, and on the UPDATE processing the rows in order (which I don't think is guaranteed, even with the primary key). It was just a suggestion. At least I think it guarantees the correct result. – GilM Feb 19 '13 at 21:46
Yes, the result is guaranteed. I realy like your solution. I had to toughly meditate on it to understand its elegance and the way it works. However it is not exactly how I would like it to look like. Because I guess that cursors and iteration is evil, at least until that time when you can not avoid it! :) – Vadim Loboda Feb 20 '13 at 07:51
A very interesting approach, like @lobodava, I would probably never have thought of this! – Andriy M Feb 24 '13 at 21:46
Yeah, and it's probably an obsolete approach (and can be messed up if somebody adds a statement that changes @@Rowcount before the loop end, so it's probably best to capture it in a variable). This is the kind of thing I used to do for things like traversing arbitrarily deep hierarchies before recursive CTE's were available. – GilM Feb 25 '13 at 15:02

Frederic · Answer 3 · 2013-02-19T19:42:48.663

a possibly solution to avoid update and cursor should be using common table expressions...

like this...

    declare @periods table (
        s date, 
        e date,
        t tinyint
    );


    insert into @periods values
    ('2013-01-01' , '2013-01-02', 3),
    ('2013-01-02' , '2013-01-04', 1),
    ('2013-01-04' , '2013-01-05', 1),
    ('2013-01-05' , '2013-01-06', 2),
    ('2013-01-06' , '2013-01-07', 2),
    ('2013-01-07' , '2013-01-08', 2),
    ('2013-01-08' , '2013-01-09', 1);

    with cte as ( select 0 as n
                        ,p.s as s
                        ,p.e as e
                        ,p.t
                        ,case when p2.s is null then 1 else 0 end fl_s
                        ,case when p3.e is null then 1 else 0 end fl_e
                  from @periods p
                  left outer join @periods p2
                  on p2.e = p.s
                  and p2.t = p.t
                  left outer join @periods p3
                  on p3.s = p.e
                  and p3.t = p.t

                  union all 

                  select  n+1 as n
                        , p2.s as s
                        , p.e as e
                        ,p.t
                        ,case when not exists(select * from @periods p3 where p3.e =p2.s and p3.t=p2.t) then 1 else 0 end as fl_s
                        ,p.fl_e as fl_e
                  from cte p
                  inner join @periods p2
                  on p2.e = p.s
                  and p2.t = p.t
                  where p.fl_s = 0

                  union all 

                  select  n+1 as n
                        , p.s as s
                        , p2.e as e
                        ,p.t
                        ,p.fl_s as fl_s
                        ,case when not exists(select * from @periods p3 where p3.s =p2.e and p3.t=p2.t) then 1 else 0 end as fl_e
                  from cte p
                  inner join @periods p2
                  on p2.s = p.e
                  and p2.t = p.t
                  where p.fl_s = 1
                  and p.fl_e = 0
    )
    ,result as (select s,e,t,COUNT(*) as count_lines
                 from cte
                 where fl_e = 1
                 and fl_s = 1
                 group by s,e,t
                 )
    select * from result
    option(maxrecursion 0)

resultset achieved...

    s           e           t   count_lines
    2013-01-01  2013-01-02  3   1
    2013-01-02  2013-01-05  1   2
    2013-01-05  2013-01-08  2   3
    2013-01-08  2013-01-09  1   1

I would not say that this is the optimal solution, but I like the idea of recursive cte. Thank you, Federic — Vadim Loboda, Feb 19 '13 at 20:27

Vadim Loboda · Answer 4 · 2013-02-20T16:17:16.600

Hooray! I've found the solution that suits me and it is done without iteration

with cte1 as (   
    select s, t  from @periods
    union all
    select max(e), null from @periods
),
cte2 as (
    select rn = row_number() over(order by s), s, t from cte1   
),
cte3 as (
    select 
        rn = row_number() over(order by a.rn),
        a.s,
        a.t 
    from 
        cte2 a 
        left join cte2 b on a.rn = b.rn + 1 and a.t = b.t
    where
        b.rn is null 
)
select 
    s = a.s, 
    e = b.s, 
    a.t  
from 
    cte3 a 
    inner join cte3 b on b.rn = a.rn + 1;

Thanks everyone for sharing your thoughts and solutions!

Details:

cte1 returns the chain of dates with the types after them:

s          t
---------- ----
2013-01-01 3
2013-01-02 1
2013-01-04 1
2013-01-05 2
2013-01-06 2
2013-01-07 2
2013-01-08 1
2013-01-09 NULL  -- there is no type *after* the last date

ct2 just add row number to the above result:

 rn       s       t
---- ----------  ----
 1    2013-01-01  3
 2    2013-01-02  1
 3    2013-01-04  1
 4    2013-01-05  2
 5    2013-01-06  2
 6    2013-01-07  2
 7    2013-01-08  1
 8    2013-01-09  NULL

if we output all the fields from the query in cte3 without where condition, we get the following results:

select * from cte2 a left join cte2 b on a.rn = b.rn + 1 and a.t = b.t;

rn       s        t       rn      s          t
---- ----------  ----    ------ ----------  ----
1    2013-01-01  3       NULL   NULL        NULL
2    2013-01-02  1       NULL   NULL        NULL
3    2013-01-04  1       2      2013-01-02  1
4    2013-01-05  2       NULL   NULL        NULL
5    2013-01-06  2       4      2013-01-05  2
6    2013-01-07  2       5      2013-01-06  2
7    2013-01-08  1       NULL   NULL        NULL
8    2013-01-09  NULL    NULL   NULL        NULL

For the dates where type is repeted there are values on the right side of the results. So we can just remove all the lines where values exist on the right side.

So cte3 returns:

rn    s           t
----- ----------  ----
1     2013-01-01  3
2     2013-01-02  1
3     2013-01-05  2
4     2013-01-08  1
5     2013-01-09  NULL

Note that because of the removal some rows there are some gaps in rn sequence, so we have to renumber them again.

From here only one thing left - to transform the dates to periods:

select 
    s = a.s, 
    e = b.s, 
    a.t  
from 
    cte3 a 
    inner join cte3 b on b.rn = a.rn + 1;

and we've got the required result:

s          e          t
---------- ---------- ----
2013-01-01 2013-01-02 3
2013-01-02 2013-01-05 1
2013-01-05 2013-01-08 2
2013-01-08 2013-01-09 1

You were right, there's no iteration here. It was indeed the `cte1` that I carelessly thought of as recursive. Good job finding a working solution, even better one explaining how it works! — Andriy M, Feb 24 '13 at 19:30

score 0 · Answer 5 · answered Feb 19 '13 at 19:18

this is your solution with a different data on the table..

    declare @periods table (
        s datetime primary key, 
        e datetime,
        t tinyint,
        period_number int   
    );

    insert into @periods (s, e, t) values
    ('2013-01-01' , '2013-01-02', 3),
    ('2013-01-02' , '2013-01-04', 1),
    ('2013-01-04' , '2013-01-05', 1),
    ('2013-01-05' , '2013-01-06', 2),
    ('2013-01-09' , '2013-01-10', 2),
    ('2013-01-10' , '2013-01-11', 1);

    declare @t tinyint = null;  
    declare @PeriodNumber int = 0;

    update @periods
        set  period_number = @PeriodNumber, 
        @PeriodNumber = case
                            when @t <> t
                                then  @PeriodNumber + 1
                            else
                                @PeriodNumber
                        end,
        @t = t;

    select 
        s = min(s),
        e = max(e),
        t = min(t)
    from 
        @periods    
    group by 
        period_number
    order by 
        s;

where have a gap between

 ('2013-01-05' , '2013-01-06', 2),
 --and
 ('2013-01-09' , '2013-01-10', 2),

your solution resultset is..

    s           e           t
    2013-01-01  2013-01-02  3
    2013-01-02  2013-01-05  1
    2013-01-05  2013-01-10  2
    2013-01-10  2013-01-11  1

isnt was spected the resultset like this..??

    s           e           t
    2013-01-01  2013-01-02  3
    2013-01-02  2013-01-05  1
    2013-01-05  2013-01-06  2
    2013-01-09  2013-01-10  2
    2013-01-10  2013-01-11  1

maybe I did misunderstood the rule of your problem...

In my real task I build those intervals myself and make sure that there are no any gaps like in your example. So my question is about the intevals than following one after another without gaps. — Vadim Loboda, Feb 19 '13 at 19:28
@lobodava oh.. ok.. sorry, I didnt knew that. in this conditions your solution will works fine.. ty for explanation.. I did post other answer with the use of cte as an alternative.. — Frederic, Feb 19 '13 at 19:38

Combine continuous datetime intervals by type

5 Answers5

Linked