Extract number of days per week from multiple date ranges

Question

I have a table trips in PostgreSQL 10.5:

id  start_date    end_date
----------------------------
1   02/01/2019    02/03/2019
2   02/02/2019    02/03/2019
3   02/06/2019    02/07/2019
4   02/06/2019    02/14/2019
5   02/06/2019    02/06/2019

I want to count the number of days in trips that overlap with given weeks. Trips in the table have inclusive bounds. Weeks start on Monday and end on Sunday. The expected result would be:

week_of    days_utilized
------------------------
01/28/19    5
02/04/19    8
02/11/19    4

For a calendar reference:

Monday 01/28/19 - Sunday 02/03/19
Monday 02/04/19 - Sunday 02/10/19
Monday 02/11/19 - Sunday 02/17/19

I know how to write this in the programming language I use, but I'd prefer to do this in Postgres and I'm unclear where to start ...

I believe you should look at `date_trunc()`. I'm not a PostgreSQL guy so there might be better options. — shawnt00, Mar 28 '19 at 20:10
Just to clarify, how is the days_utilized calculated? For 01/28/19 is it counting 3 days from Trip id 1 and 2 days from Trip id 2 and then summing? — BShaps, Mar 28 '19 at 20:11
@BShaps Exactly, days utilized would be the count of trips that were ongoing that week. — hummmingbear, Mar 28 '19 at 20:21
Your number don't seem to add up. Neither for "trips" nor for "days". Example: last week starting on '2019-02-11': *1* trip spanning *4* days. You display "3" ... — Erwin Brandstetter, Mar 28 '19 at 22:27
@ErwinBrandstetter sorry about that, I was off. I edited and corrected, thank you. — hummmingbear, Mar 28 '19 at 23:04
Your numbers still appear to be wrong. It looks like your second week should have 8 days utilised. (2+5+1) — MatBailie, Mar 28 '19 at 23:12
I took the liberty to fix your question. Please edit if I didn't get it right. — Erwin Brandstetter, Mar 28 '19 at 23:40

Gordon Linoff · Accepted Answer · 2019-03-28T22:05:49.033

You seem to want generate_series() and a join and group by. To count the week covered:

select gs.wk, count(t.id) as num_trips
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
     trips t
     on gs.wk <= t.end_date and
        gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;

EDIT:

I see you want the days covered. This is slightly more work in the aggregation:

select gs.wk, count(t.id) as num_trips,
       sum( 1 +
            extract(day from (least(gs.wk + interval '6 day', t.end_date) - greatest(gs.wk, t.start_date)))
          ) as days_utilized
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
     trips t
     on gs.wk <= t.end_date and
        gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;

Note: This doesn't return the exactly results you have. I think these are correct.

I plugged this in and it seems to work, and yes my numbers in the example I believe we're slightly off, apologies. I'm going to manually check this against our Prod data, but this is enough to get me going regardless. Thank you so much for taking the time to answer! — hummmingbear, Mar 28 '19 at 23:02

Erwin Brandstetter · Answer 2 · 2023-01-10T06:42:27.590

Consider range types for this. Makes the computations simpler and clearer with range operators. I use the overlap operator && and the intersection operator * below. Support that with a functional GiST or SP-GiST index to make queries fast - if the table is big. Like:

CREATE INDEX trip_range_idx ON trip
USING gist (daterange(start_date, end_date, '[]'));

Then your query can use this index:

SELECT week
     , count(overlap)                       AS ct_trips
     , sum(upper(overlap) - lower(overlap)) AS days_utilized
FROM  (
   SELECT week, trip * week AS overlap
   FROM  (
      SELECT daterange(mon::date, mon::date + 7) AS week
      FROM   generate_series(timestamp '2019-01-28'
                           , timestamp '2019-02-11'
                           , interval  '1 week') mon
      ) w
   LEFT   JOIN (SELECT daterange(start_date, end_date, '[]') FROM trip) t(trip) ON trip && week
   ) sub
GROUP  BY 1
ORDER  BY 1;

db<>fiddle here

By default a date_range consists of an inclusive lower and and exclusive upper bound. Your ranges include upper and lower bound, so create the daterange with: daterange(start_date, end_date, '[]'). The function upper() still returns the exclusive upper bound. Hence the expression upper(overlap) - lower(overlap) does the right thing to count days.

There is a reason I use generate_series() with timestamp input:

Generating time series between two dates in PostgreSQL

Perform this hours of operation query in PostgreSQL

Or, if you don't want to use range types, consider the OVERLAPS operator:

Find overlapping date ranges in PostgreSQL

I gave this a try, and I'm getting some pretty inflated numbers. The only difference from your table to mine, is `start_date` and `end_date` are datetime, I had to cast them to date to make your query work, but again the numbers I'm getting are extremely inflated. The index you created is only for speed, correct? Any thoughts? — hummmingbear, Mar 29 '19 at 01:36
This could makes sense with small tables, where performance is not critical, for clear code. Or for small selections from big tables with index support. Else, forming ranges is just overhead slowing the query down. — Erwin Brandstetter, Mar 29 '19 at 02:04

Extract number of days per week from multiple date ranges

2 Answers2