1

I have a table of events each with a given start and end date.

CREATE TABLE dataset (    
id int(11) unsigned NOT NULL AUTO_INCREMENT,
event varchar(64) NOT NULL DEFAULT '',
valid_from date DEFAULT NULL,
valid_to date DEFAULT NULL,
PRIMARY KEY (id)
);

INSERT INTO dataset (event, valid_from, valid_to) VALUES
('Test1', '2014-12-01', '2014-12-01'),
('Test2', '2014-12-02', '2014-12-02'),
('Test3', '2014-12-03', '2014-12-03'),
('Test4', '2014-12-04', '2014-12-04'),
('Test5', '2014-12-05', '2014-12-05'),
('Test6', '2014-12-01', '2014-12-01'),
('Test7', '2014-12-01', '2014-12-07');

I need help with a query to get the number of max concurrent events in a given range of dates. If start and end date are the same its working as expected with this query:

SET @d1 = '2014-12-01';
SET @d2 = '2014-12-01';
SELECT COUNT(*) as valid_events FROM dataset WHERE @d2 >= valid_from AND valid_to >= @d1;

This will return 3 which is true for the 1st of December it matches Test1, Test6 and Test7. But if I extend the range it stops working for me:

SET @d1 = '2014-12-01';
SET @d2 = '2014-12-07';
SELECT COUNT(*) as valid_events FROM dataset WHERE @d2 >= valid_from AND valid_to >= @d1;

This returns 7 because all the events overlap with the start and end date but I want them only to be counted if they overlap with each other as well.

            Test1 Test2 Test3 Test4 Test5 Test6 Test7  Result
2014-12-01    X                             X     X     3
2014-12-02          X                             X     2
2014-12-03                X                       X     2
2014-12-04                      X                 X     2
2014-12-05                            X           X     2
2014-12-06                                        X     1
2014-12-07                                        X     1

So if I request the max amount of concurrent events between 1st and 7th December I want to have the result of 3 and not 7.

  • Say, for the sake of StackOverflow's usefulness, could you write up your answer as an answer, and accept it? – O. Jones Dec 17 '14 at 12:04
  • Note that the overhead of a calendar table containing all probable dates is (depressingly) trivial! But purists dislike them, and I guess I can kind of see why. – Strawberry Dec 17 '14 at 12:14

3 Answers3

0

The maximum number of concurrent events occurs on one of the start dates. Hence, you can do this with some joins and aggregations:

select d.valid_from, count(*) as numoverlaps
from (select distinct d.valid_from from dataset d
     ) d join
     dataset d2
     on d.valid_from >= d2.valid_from and d.valid_from <= d2.valid_to
group by d.valid_from;

You can get the max by adding an order by and limit:

select d.*
from (select d.valid_from, count(*) as numoverlaps
      from (select distinct d.valid_from from dataset d
           ) d join
           dataset d2
           on d.valid_from >= d2.valid_from and d.valid_from <= d2.valid_to
      group by d.valid_from
     ) d
order by numoverlaps desc
limit 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0
 SELECT * FROM calendar WHERE dt BETWEEN  '2014-11-30' AND '2015-01-01';
+------------+
| dt         |
+------------+
| 2014-11-30 |
| 2014-12-01 |
| 2014-12-02 |
| 2014-12-03 |
| 2014-12-04 |
| 2014-12-05 |
| 2014-12-06 |
| 2014-12-07 |
| 2014-12-08 |
| 2014-12-09 |
| 2014-12-10 |
| 2014-12-11 |
| 2014-12-12 |
| 2014-12-13 |
| 2014-12-14 |
| 2014-12-15 |
| 2014-12-16 |
| 2014-12-17 |
| 2014-12-18 |
| 2014-12-19 |
| 2014-12-20 |
| 2014-12-21 |
| 2014-12-22 |
| 2014-12-23 |
| 2014-12-24 |
| 2014-12-25 |
| 2014-12-26 |
| 2014-12-27 |
| 2014-12-28 |
| 2014-12-29 |
| 2014-12-30 |
| 2014-12-31 |
| 2015-01-01 |
+------------+

SELECT x.dt
     , COUNT(*) total 
  FROM calendar x
  JOIN dataset y 
    ON x.dt BETWEEN y.valid_from AND y.valid_to 
 GROUP 
    BY dt;
+------------+-------+
| dt         | total |
+------------+-------+
| 2014-12-01 |     3 |
| 2014-12-02 |     2 |
| 2014-12-03 |     2 |
| 2014-12-04 |     2 |
| 2014-12-05 |     2 |
| 2014-12-06 |     1 |
| 2014-12-07 |     1 |
+------------+-------+

So just ORDER BY... LIMIT - you know the rest

Strawberry
  • 33,750
  • 13
  • 40
  • 57
0

The Tip from "Strawberry" helped me find a solution what I ended up with looks like this:

SET @d1 = '2014-12-01';
SET @d2 = '2014-12-07';

SELECT x.selected_date, COUNT(*) total 
  FROM (
    select * from 
    (select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) selected_date from
    (select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
    (select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
    (select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
    (select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
    (select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
    where selected_date between @d1 and @d2
  ) x
  JOIN dataset y 
    ON x.selected_date BETWEEN y.valid_from AND y.valid_to 
 GROUP 
    BY selected_date
 ORDER
    BY total DESC
 LIMIT 1;

It's basically his solution but without the need for a additional calendar table. To get a list of dates between two given dates I used this solution: https://stackoverflow.com/a/13814885/3342150

Thanks

Community
  • 1
  • 1