0

I have multiple rows having start time & end time columns that overlap each other.

I need to find distinct time intervals using SQL.

Sample data:

(6 -> 7)
(6.30 -> 6.45)
(8 -> 9)
(8.30 -> 9.30)

Output:

(6 -> 7)
(8 -> 9.30)
dnoeth
  • 59,503
  • 4
  • 39
  • 56
Neha
  • 233
  • 1
  • 3
  • 11
  • 2
    What dbms? How is the timestamp stored? –  Mar 03 '16 at 08:42
  • 1
    Vertica db. Start time/End Time in Timestamp format : 2016-03-03 14:19:05 – Neha Mar 03 '16 at 08:48
  • This is a *gaps & islands* problem, which has been answered several times on SO: http://stackoverflow.com/questions/tagged/gaps-and-islands+sql Vertica supports Analytical functions, so it should be easy to modify one of those answers... – dnoeth Mar 03 '16 at 09:03
  • 1
    Hi @dnoeth I found two answers (ref - http://stackoverflow.com/questions/4490553/detect-overlapping-date-ranges-from-the-same-table) but none seem to solve my problem here. Could you help me out with a specific example? – Neha Mar 03 '16 at 11:10
  • Have a look at Itzik Ben Gan's solution for SQL Server http://blogs.solidq.com/en/sqlserver/packing-intervals/ There seems to be a problem with the website currently, this is a previous one http://sqlmag.com/blog/tsql-challenge-packing-date-and-time-intervals This is another one I posted for Teradata http://stackoverflow.com/questions/17946389/time-interval-overlaps-teradata/17957283#17957283 – dnoeth Mar 03 '16 at 11:25

2 Answers2

2

I would comment on Mauro's but I don't have the rep. Unfortunately his answer doesn't consider what happens when you have more than 2 overlapping periods.

Here is my solution:

--create the table for the purposes of this demo
drop schema if exists TEST1 cascade;     
create schema if not exists TEST1;
drop table if exists      TEST1.otest;
create table if not exists TEST1.otest(t1 datetime, t2 datetime);

--create some example data
--example where 2nd period is entirely inside the first
insert into TEST1.otest(t1, t2) select '2016-03-04 06:00:00' ,'2016-03-04 07:00:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 06:30:00' ,'2016-03-04 06:45:00';

--example of multiple consecutive periods
insert into TEST1.otest(t1, t2) select '2016-03-04 08:00:00' ,'2016-03-04 09:00:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:15:00' ,'2016-03-04 08:25:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:26:00' ,'2016-03-04 08:27:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:28:00' ,'2016-03-04 08:29:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:30:00' ,'2016-03-04 09:30:00';

--example of another overlapping period extending the end time
insert into TEST1.otest(t1, t2) select '2016-03-04 10:00:00' ,'2016-03-04 10:30:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 10:15:00' ,'2016-03-04 10:45:00';

--query syntax
with i as (select * from TEST1.otest) 
,i2 as (select * ,max(t2) over (order by t1) as maxT2 from i)
,i3 as (select *, lag(i2.maxT2) over (order by t1) as laggedMaxT2 from i2)
,i4 as (select *, conditional_true_event(i3.t1 > i3.laggedMaxT2) over (order by t1) as grouper from i3)
select min(t1) as collapsedT1, max(t2) as collapsedT2 from i4 group by grouper
order by collapsedT1;

--results
    collapsedT1         |collapsedT2         |
--------------------|--------------------|
2016-03-04 06:00:00 |2016-03-04 07:00:00 |
2016-03-04 08:00:00 |2016-03-04 09:30:00 |
2016-03-04 10:00:00 |2016-03-04 10:45:00 |

Edit: if your data is categorised by other columns, remember to add partition clauses to the max, conditional_true_event and lag analytics or you could get non-deterministic results.

ThatDataGuy
  • 1,969
  • 2
  • 17
  • 43
0

Vertica has very powerful "time series" and "conditional events" analytics functions. Your problem can be easily solved this way...

Suppose this is your start table:

SQL> select * from otest ;
        t1          |         t2          
--------------------+--------------------
2016-03-04 06:00:00 | 2016-03-04 07:00:00
2016-03-04 06:30:00 | 2016-03-04 06:45:00
2016-03-04 08:00:00 | 2016-03-04 09:00:00
2016-03-04 08:30:00 | 2016-03-04 09:30:00
(4 rows)

Where t1 is your start timestamp and t2 is your end timestamp. All you have to do is:

SQL> select 
         min(a.t1), 
         max(a.t2) 
     from ( 
         select 
            t1, 
            t2, 
            conditional_true_event ( t1 >= lag(t2) ) 
               over ( order by t1 ) as cte 
         from otest ) a  
     group by cte 
     order by 1 ;

        min         |         max         
--------------------+--------------------
2016-03-04 06:00:00 | 2016-03-04 07:00:00
2016-03-04 08:00:00 | 2016-03-04 09:30:00
(2 rows)
mauro
  • 5,730
  • 2
  • 26
  • 25