2

I am trying to aggregate overlapping effective dates. Any gaps between dates should be considered as separate rows. I am using min and max and I am getting below output but would like to see expected output.

My query

WITH test_data AS (
    SELECT '2020-01-01' AS date_from,
           '2020-01-03' AS date_to,
           '1'          AS product
    UNION ALL
    SELECT '2020-01-05' AS date_from,
           '2020-01-07' AS date_to,
           '1'          AS product
    UNION ALL
    SELECT '2020-01-06' AS date_from,
           '2020-01-10' AS date_to,
           '1'          AS product
)
SELECT product,
       MIN(date_from) AS date_from,
       MAX(date_to)   AS date_to
FROM test_data
GROUP BY 1;

Source data

date_from date_to product
2020-01-01 2020-01-03 1
2020-01-05 2020-01-07 1
2020-01-06 2020-01-10 1

Output table

date_from date_to product
2020-01-01 2020-01-10 1

Expected output

date_from date_to product
2020-01-01 2020-01-03 1
2020-01-05 2020-01-10 1

Thanks in advance !

  • Could you please add to your question the query that you're come up with? That would help the others to see what you have tried and what exactly has to be fixed. – ginkul May 06 '21 at 09:57
  • I think you're looking for [this](https://stackoverflow.com/a/8120432/12885468) – ginkul May 06 '21 at 10:26
  • @DominikGolebiewski . . . Tag your question with the database you are using. – Gordon Linoff May 06 '21 at 10:58

2 Answers2

3

This is a type of gaps-and-islands problem. I recommend an approach like this:

SELECT product,
       MIN(date_from) AS date_from,
       MAX(date_to)   AS date_to
FROM (SELECT td.*,
             SUM(CASE WHEN prev_date_to >= date_from THEN 0 ELSE 1 END) OVER (PARTITION BY product ORDER BY date_to) as grp
      FROM (SELECT td.*,
                   MAX(date_to) OVER (PARTITION BY product ORDER BY date_from ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as prev_date_to
            FROM test_data td
           ) td
     ) td
GROUP BY grp, product
ORDER BY product, MIN(date_from);

Here is a db<>fiddle.

What is this doing? The innermost subquery is getting the latest date_to on previous rows. This is used to determine if each row is "connected" to the previous row or if it starts a new grouping.

The middle subquery has logic which is a cumulative sum of when the rows start a new group. The outer query then aggregates by this grouping.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

Merging of date ranges could be achieved with MATCH_RECOGNIZE.

Data preparation:

CREATE OR REPLACE TABLE test_data AS
SELECT '2020-01-01'::DATE AS date_from, '2020-01-03'::DATE AS date_to, '1'  AS product
UNION ALL
SELECT '2020-01-05'::DATE AS date_from, '2020-01-07'::DATE AS date_to, '1'  AS product
UNION ALL
SELECT '2020-01-06'::DATE AS date_from, '2020-01-10'::DATE AS date_to, '1' AS product;

Query:

SELECT * 
FROM test_data t
MATCH_RECOGNIZE(
  PARTITION BY product
  ORDER BY date_from, date_to
  MEASURES FIRST(date_from) date_from, MAX(date_to) date_to
  PATTERN(a* b)
  DEFINE a AS MAX(date_to) OVER() >= NEXT(date_from)
) mr;

db<>fiddle demo - Oracle

Related reading: Merging Overlapping Date Ranges with MATCH_RECOGNIZE by stewashton

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275