2

I have a table trips in PostgreSQL 10.5:

id  start_date    end_date
----------------------------
1   02/01/2019    02/03/2019
2   02/02/2019    02/03/2019
3   02/06/2019    02/07/2019
4   02/06/2019    02/14/2019
5   02/06/2019    02/06/2019

I want to count the number of days in trips that overlap with given weeks. Trips in the table have inclusive bounds. Weeks start on Monday and end on Sunday. The expected result would be:

week_of    days_utilized
------------------------
01/28/19    5
02/04/19    8
02/11/19    4

For a calendar reference:

Monday 01/28/19 - Sunday 02/03/19
Monday 02/04/19 - Sunday 02/10/19
Monday 02/11/19 - Sunday 02/17/19

I know how to write this in the programming language I use, but I'd prefer to do this in Postgres and I'm unclear where to start ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
hummmingbear
  • 2,294
  • 5
  • 25
  • 42

2 Answers2

3

You seem to want generate_series() and a join and group by. To count the week covered:

select gs.wk, count(t.id) as num_trips
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
     trips t
     on gs.wk <= t.end_date and
        gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;

EDIT:

I see you want the days covered. This is slightly more work in the aggregation:

select gs.wk, count(t.id) as num_trips,
       sum( 1 +
            extract(day from (least(gs.wk + interval '6 day', t.end_date) - greatest(gs.wk, t.start_date)))
          ) as days_utilized
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
     trips t
     on gs.wk <= t.end_date and
        gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;

Note: This doesn't return the exactly results you have. I think these are correct.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I plugged this in and it seems to work, and yes my numbers in the example I believe we're slightly off, apologies. I'm going to manually check this against our Prod data, but this is enough to get me going regardless. Thank you so much for taking the time to answer! – hummmingbear Mar 28 '19 at 23:02
0

Consider range types for this. Makes the computations simpler and clearer with range operators. I use the overlap operator && and the intersection operator * below. Support that with a functional GiST or SP-GiST index to make queries fast - if the table is big. Like:

CREATE INDEX trip_range_idx ON trip
USING gist (daterange(start_date, end_date, '[]'));

Then your query can use this index:

SELECT week
     , count(overlap)                       AS ct_trips
     , sum(upper(overlap) - lower(overlap)) AS days_utilized
FROM  (
   SELECT week, trip * week AS overlap
   FROM  (
      SELECT daterange(mon::date, mon::date + 7) AS week
      FROM   generate_series(timestamp '2019-01-28'
                           , timestamp '2019-02-11'
                           , interval  '1 week') mon
      ) w
   LEFT   JOIN (SELECT daterange(start_date, end_date, '[]') FROM trip) t(trip) ON trip && week
   ) sub
GROUP  BY 1
ORDER  BY 1;

db<>fiddle here

By default a date_range consists of an inclusive lower and and exclusive upper bound. Your ranges include upper and lower bound, so create the daterange with: daterange(start_date, end_date, '[]'). The function upper() still returns the exclusive upper bound. Hence the expression upper(overlap) - lower(overlap) does the right thing to count days.

There is a reason I use generate_series() with timestamp input:

Related:

Or, if you don't want to use range types, consider the OVERLAPS operator:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I gave this a try, and I'm getting some pretty inflated numbers. The only difference from your table to mine, is `start_date` and `end_date` are datetime, I had to cast them to date to make your query work, but again the numbers I'm getting are extremely inflated. The index you created is only for speed, correct? Any thoughts? – hummmingbear Mar 29 '19 at 01:36
  • This could makes sense with small tables, where performance is not critical, for clear code. Or for small selections from big tables with index support. Else, forming ranges is just overhead slowing the query down. – Erwin Brandstetter Mar 29 '19 at 02:04