Select min/max dates for periods that don't intersect

Question

Example! I have a table with 4 columns. date format dd.MM.yy

id   ban   start     end
1    1     01.01.15  31.12.18
1    1     02.02.15  31.12.18
1    1     05.04.15  31.12.17

In this case dates from rows 2 and 3 are included in dates from row 1

1    1     02.04.19  31.12.20
1    1     05.05.19  31.12.20

In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.

01.01.15  31.12.18

and

02.04.19  31.12.20

Situation where a date starts in one period and ends in another are impossible. The end result should look like this

1    1     01.01.15  31.12.18
1    1     02.04.19  31.12.20

I tried using analitical functions(LAG)

select id
  , ban
  , case 
    when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start) 
    and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
    then nvl(lag(start) over (partition by id, ban order by start, end asc), start) 
    else start
    end as start
  , case 
    when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start) 
    and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
    then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
    else end
    end as end
from table

Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this

1   1    08.09.15   31.12.99
1   1    31.12.15   31.12.99

turns into this

1   1    08.09.15   31.12.99
1   1    08.09.15   31.12.99

which I can then group by all fields and get what I want, but if there are more

1   2        13.11.15   31.12.99
1   2        31.12.15   31.12.99
1   2        16.06.15   31.12.99

I get

1   2        16.06.15   31.12.99
1   2        16.06.15   31.12.99
1   2        13.11.15   31.12.99

I understand why this happens, but how do I work around it? Running the query multiple times is not an option.

Which version of Oracle. If it is 12, match_recognize might be the answer: http://modern-sql.com/feature/match_recognize — Markus Winand, Nov 10 '17 at 10:44
"date format dd.MM.yy" - Dates do not have a format; they are stored internally as [7-bytes](https://stackoverflow.com/questions/13568193/how-are-dates-stored-in-oracle) and it is the user interface that formats it according to its defaults (typically the `NLS_DATE_FORMAT` session parameter) unless you provide an explicit format using `TO_CHAR`. — MT0, Nov 10 '17 at 11:04
@MT0 My mistake. I'm not very familiar with oracle. But thanks for the info) — Evgenii, Nov 10 '17 at 14:41

score 1 · Accepted Answer · answered Nov 10 '17 at 12:42

This query looks promising:

-- test data
with t(id, ban, dtstart, dtend) as (
    select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
    select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
    select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
    select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
    select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
    select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction 
            from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction, 
                 sum(direction) over (partition by id, ban order by dt) sm 
            from step1),
step3 as (select id, ban, direction, dt dt1, 
                 lead(dt) over (partition by id, ban order by dt) dt2 
            from step2      
            where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2 
  from step3 where direction = 1 order by id, ban, dt1

step1 - unpivot dates and assign 1 for start date, -1 for end date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()

You can shorten this syntax, I divided it to steps to show what's going on.

Result:

    ID        BAN DT1         DT2
------ ---------- ----------- -----------
     1          1 2015-01-01  2015-03-31
     1          1 2015-08-05  2015-12-31
     1          2 2015-01-01  2016-12-31
     2          1 2016-01-01  2017-12-31

I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().

Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.

BTW - START is reserved word in Oracle so in my example I changed slightly column names.

Gordon Linoff · Answer 2 · 2017-11-11T19:20:05.927

0

I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:

select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
      from (select t.*,
                   (case when exists (select 1
                                      from t t2
                                      where t2.id = t.id and t2.ban = t.ban and
                                            t.start <= t2.end and t.end >= t2.start and
                                            t.start <> t2.start and t.end <> t2.end
                                     )
                         then 0 else 1
                    end) as start_flag
            from t
           ) t
    ) t
group by id, ban, grp;

edited Nov 11 '17 at 19:20

answered Nov 10 '17 at 10:46

Gordon Linoff

1,242,037
58
646
786

Wouldn't `case when exists` always return 0? Here `t.start <= t2.end and t.end >= t2.start`, every time a row is compared with itself, the start will obviously be before end and end after start. – Evgenii Nov 10 '17 at 14:39
@Evgenii . . . Good catch. Fixed the query. – Gordon Linoff Nov 11 '17 at 19:20

Select min/max dates for periods that don't intersect

2 Answers2