I have multiple rows having start time & end time columns that overlap each other.
I need to find distinct time intervals using SQL.
Sample data:
(6 -> 7)
(6.30 -> 6.45)
(8 -> 9)
(8.30 -> 9.30)
Output:
(6 -> 7)
(8 -> 9.30)
I have multiple rows having start time & end time columns that overlap each other.
I need to find distinct time intervals using SQL.
Sample data:
(6 -> 7)
(6.30 -> 6.45)
(8 -> 9)
(8.30 -> 9.30)
Output:
(6 -> 7)
(8 -> 9.30)
I would comment on Mauro's but I don't have the rep. Unfortunately his answer doesn't consider what happens when you have more than 2 overlapping periods.
Here is my solution:
--create the table for the purposes of this demo
drop schema if exists TEST1 cascade;
create schema if not exists TEST1;
drop table if exists TEST1.otest;
create table if not exists TEST1.otest(t1 datetime, t2 datetime);
--create some example data
--example where 2nd period is entirely inside the first
insert into TEST1.otest(t1, t2) select '2016-03-04 06:00:00' ,'2016-03-04 07:00:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 06:30:00' ,'2016-03-04 06:45:00';
--example of multiple consecutive periods
insert into TEST1.otest(t1, t2) select '2016-03-04 08:00:00' ,'2016-03-04 09:00:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:15:00' ,'2016-03-04 08:25:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:26:00' ,'2016-03-04 08:27:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:28:00' ,'2016-03-04 08:29:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 08:30:00' ,'2016-03-04 09:30:00';
--example of another overlapping period extending the end time
insert into TEST1.otest(t1, t2) select '2016-03-04 10:00:00' ,'2016-03-04 10:30:00';
insert into TEST1.otest(t1, t2) select '2016-03-04 10:15:00' ,'2016-03-04 10:45:00';
--query syntax
with i as (select * from TEST1.otest)
,i2 as (select * ,max(t2) over (order by t1) as maxT2 from i)
,i3 as (select *, lag(i2.maxT2) over (order by t1) as laggedMaxT2 from i2)
,i4 as (select *, conditional_true_event(i3.t1 > i3.laggedMaxT2) over (order by t1) as grouper from i3)
select min(t1) as collapsedT1, max(t2) as collapsedT2 from i4 group by grouper
order by collapsedT1;
--results
collapsedT1 |collapsedT2 |
--------------------|--------------------|
2016-03-04 06:00:00 |2016-03-04 07:00:00 |
2016-03-04 08:00:00 |2016-03-04 09:30:00 |
2016-03-04 10:00:00 |2016-03-04 10:45:00 |
Edit: if your data is categorised by other columns, remember to add partition clauses to the max, conditional_true_event and lag analytics or you could get non-deterministic results.
Vertica has very powerful "time series" and "conditional events" analytics functions. Your problem can be easily solved this way...
Suppose this is your start table:
SQL> select * from otest ;
t1 | t2
--------------------+--------------------
2016-03-04 06:00:00 | 2016-03-04 07:00:00
2016-03-04 06:30:00 | 2016-03-04 06:45:00
2016-03-04 08:00:00 | 2016-03-04 09:00:00
2016-03-04 08:30:00 | 2016-03-04 09:30:00
(4 rows)
Where t1
is your start timestamp and t2
is your end timestamp. All you have to do is:
SQL> select
min(a.t1),
max(a.t2)
from (
select
t1,
t2,
conditional_true_event ( t1 >= lag(t2) )
over ( order by t1 ) as cte
from otest ) a
group by cte
order by 1 ;
min | max
--------------------+--------------------
2016-03-04 06:00:00 | 2016-03-04 07:00:00
2016-03-04 08:00:00 | 2016-03-04 09:30:00
(2 rows)