0

I have a booking table with the following columns:

id, start_date, end_date

I want to know which days have had the most bookings over my dataset.

I can use dayofweek() on the start date and group by this also and use a count(*). But I also want to include the days between the start of booking and end.

An example output wouldbe

dayofweek     count(*)
1             1
2             1
3             1
4             2
5             3
6             3
7             1

for the following set

id       start_date          end_date
1        2017-10-01          2017-10-07
2        2017-10-04          2017-10-07
3        2017-10-06          2017-10-08
Strawberry
  • 33,750
  • 13
  • 40
  • 57
Ostrich-39
  • 25
  • 1
  • 4

2 Answers2

0

You can accomplish this with a recursive table:

WITH cte AS
(
    SELECT DATE_ADD(start_date INTERVAL 1 DAY) AS date, end_date, DAYOFWEEK(start_date) AS dw from bookings
    UNION ALL 
    SELECT DATE_ADD(start_date INTERVAL 1 DAY), end_date, DAYOFWEEK(date)
    FROM cte WHERE date <= end_date
)
SELECT COUNT(*), dw FROM cte GROUP BY dw
zambonee
  • 1,599
  • 11
  • 17
0

I am assuming you wish to know something like how many rooms are filled for each date for the duration between the start and end. The"trick" here is that a long period between start/end will repeat the day or week and/or that the end day of week might be smaller than the start day of week. So, I have:

  1. generated a list of 100,000 dates (1 per row)
  2. joined those dates between the start/end of your table
  3. converted each joined rows to a day of week number to be counted
  4. left joined to a list of 1 to 7, and counted the rows of step 3

NOTE: if the end_date is a "check out date" then it may be necessary to deduct 1 day from each record to compensate (which is not done below).

This approach is available for review here at SQL Fiddle

MySQL 5.6 Schema Setup:

CREATE TABLE Table1
    (`id` int, `start_date` datetime, `end_date` datetime)
;

INSERT INTO Table1
    (`id`, `start_date`, `end_date`)
VALUES
    (1, '2017-09-21 00:00:00', '2017-10-07 00:00:00'), ## added this row
    (1, '2017-10-01 00:00:00', '2017-10-07 00:00:00'),
    (2, '2017-10-04 00:00:00', '2017-10-07 00:00:00'),
    (3, '2017-10-06 00:00:00', '2017-10-08 00:00:00')
;

Query:

set @commence := str_to_date('2000-01-01','%Y-%m-%d')

select
    w.dy
  , count(t.wdy)
from (
      select 1 dy union all select 2 dy union all select 3 dy union all
      select 4 dy union all select 5 dy union all select 6 dy union all select 7 dy
      ) w
left join (
      select DAYOFWEEK(cal.dy) wdy
      from (
              select adddate( @commence ,t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) dy 
              from  (     select 0 i union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t0 
              cross join (select 0 i union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t1 
              cross join (select 0 i union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t2 
              cross join (select 0 i union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t3 
              cross join (select 0 i union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t4
          ) cal
      INNER JOIN Table1 t on cal.dy between t.start_date and t.end_date
      ) t on w.dy = t.wdy
group by
    w.dy

Results:

| dy | count(t.wdy) |
|----|--------------|
|  1 |            4 |
|  2 |            3 |
|  3 |            3 |
|  4 |            4 |
|  5 |            5 |
|  6 |            6 |
|  7 |            6 |

Also see: How to get list of dates between two dates in mysql select query where the accepted answer is the basis for the set of cross joins that produces 100,000 dates starting from a nominated date. I modified that however for syntax (explicit cross join syntax), a parameter as start point, and use of union all for efficiency.

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • This works! However I have a test table with 100 rows and it currently takes 20 seconds, how well would this query run if say I had 100,000? – Ostrich-39 Oct 07 '17 at 07:18
  • a. I cannot really predict that, and b. it most probably depend on your indexes, and c. instead of a dynamically generated list of dates you could create a table instead and index that also. Always refer to explain plans for performance details. If asking a new question about performance include full DDL of tables and related indexes. PLUS the explain plan output (as text) – Paul Maxwell Oct 07 '17 at 07:47