13

Let's say you have the following PostgreSQL sparse table listing reservation dates:

CREATE TABLE reserved_dates (
    reserved_days_id    SERIAL  NOT NULL,
    reserved_date       DATE    NOT NULL
);

INSERT INTO reserved_dates (reserved_date) VALUES
    ('2014-10-11'),
    ('2014-10-12'),
    ('2014-10-13'),
    -- gap
    ('2014-10-15'),
    ('2014-10-16'),
    -- gap
    ('2014-10-18'),
    -- gap
    ('2014-10-20'),
    ('2014-10-21');

How do you aggregate those dates into continuous date ranges (ranges without gaps)? Such as:

 start_date | end_date
------------+------------
 2014-10-11 | 2014-10-13
 2014-10-15 | 2014-10-16
 2014-10-18 | 2014-10-18
 2014-10-20 | 2014-10-21

This is what I came up with so far, but I can only get start_date this way:

WITH reserved_date_ranges AS (
    SELECT reserved_date,
           reserved_date
           - LAG(reserved_date) OVER (ORDER BY reserved_date) AS difference
    FROM reserved_dates
)
SELECT *
FROM reserved_date_ranges
WHERE difference > 1 OR difference IS NULL;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Linas Valiukas
  • 1,316
  • 1
  • 13
  • 23

1 Answers1

13
SELECT min(reserved_date) AS start_date
     , max(reserved_date) AS end_date
FROM  (
   SELECT reserved_date
        , reserved_date - row_number() OVER (ORDER BY reserved_date)::int AS grp
   FROM   reserved_dates
   ) sub 
GROUP  BY grp
ORDER  BY grp;
  1. Compute gap-less serial numbers in chronological order with the window function row_number(). Duplicate dates are not allowed. (I added a UNIQUE constraint in the fiddle.)

    If your reserved_days_id happens to be gap-less and in chronological order, you can use that directly instead. But that's typically not the case.

  2. Subtract that from reserved_date in each row (after converting to integer). Consecutive days end up with the same date value grp - which has no other purpose or meaning than to form groups.

  3. Aggregate in the outer query. Voilá.

db<>fiddle here
Old sqlfiddle

Similar cases:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This A seems hard to apply in general case for time series with a set resolution, e.g. 2 days or every 10 minutes. Timestamps would need to be cast to real values and float point operations aren't precise. Are there alternatives? – mlt Mar 09 '18 at 20:55
  • 1
    I take my words back. This solution works with timestamp columns akin to `extract(epoch from timestamp) - extract(epoch from interval '1day')*row_number() over (order by timestamp)` – mlt Mar 09 '18 at 21:18
  • After all this years, I realized it is not duplicates-proof https://www.db-fiddle.com/f/31UdD6udUZqwQk43pRbGKZ/1 – mlt Jun 20 '20 at 05:46
  • 2
    Use `dense_rank` instead of `row_number` if you happen to have duplicates. – mlt Jun 20 '20 at 23:23