4

I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.

I'm trying to output the sum of overlapping times and the number of events that overlapped.

What is the most efficient / simple way to perform this query in MySQL?

CREATE TABLE IF NOT EXISTS `events` (
  `EventID` int(10) unsigned NOT NULL auto_increment,
  `StartTime` datetime NOT NULL,
  `EndTime` datetime default NULL,
  PRIMARY KEY  (`EventID`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;


INSERT INTO `events` (`EventID`, `StartTime`, `EndTime`) VALUES
(10001, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003, '2009-02-09 07:00:00', '2009-02-09 09:00:00');


# if the query was run using the data above,
# the table below would be the desired output

# Number of Overlapped Events | Total Amount of Time those events overlapped.
1, 03:00:00
2, 02:00:00
3, 02:00:00

The purpose of these results is to generate a bill for hours used. (if you have one event running, you might pay 10 dollars per hour. But if two events are running, you only have to pay 8 dollars per hour, but only for the period of time you had two events running.)

maxsilver
  • 4,524
  • 4
  • 28
  • 24
  • This question lacks clarity. What is the use of such a question, and your proposal is flawed.. 3 hours - only one event was running (3a to 5a, and 9a to 10a) 2 hours - two events were (concurrently) running (5a to 7a) 2 hours - all three events were running (7a to 9a) Between 7 and 9 there were 3 concurrent events, so your middle bullet is wrongh and ,mneaningless. – Eddie Jul 20 '09 at 17:42
  • I apologize for the confusion, I have edited the question to improve clarity, and added the purpose behind the question. I don't understand what you mean, when you say the "proposal is flawed". You are absolutely right, between 7 and 9 there were 3 concurrent events, but my question already mentions this in the example above (that was line 3). The line you mentioned (Line 2) was for the period 5a to 7a, not 7a to 9a. I hope the included SQL clarifies this point. – maxsilver Jul 20 '09 at 23:12
  • +1 for your clarifications to the question, and especially for including the background for the question, SQL for the table structure and test data, expected output, and no unnecessary / irrelevant clutter. In its current form, I'd say it's one of the better questions I've seen on here. – Mark Byers Jan 25 '10 at 13:28

3 Answers3

5

Try this:

SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
    SELECT
        COUNT(*) AS `Count`,
        UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
    FROM (
        SELECT @rownum1 := @rownum1 + 1 AS rownum, `Time`
        FROM (
            SELECT DISTINCT(StartTime) AS `Time` FROM events
            UNION
            SELECT DISTINCT(EndTime) AS `Time` FROM events
        ) AS AllTimes, (SELECT @rownum1 := 0) AS Rownum
        ORDER BY `Time` DESC
    ) As Times1
    JOIN (
        SELECT @rownum2 := @rownum2 + 1 AS rownum, `Time`
        FROM (
            SELECT DISTINCT(StartTime) AS `Time` FROM events
            UNION
            SELECT DISTINCT(EndTime) AS `Time` FROM events
        ) AS AllTimes, (SELECT @rownum2 := 0) AS Rownum
        ORDER BY `Time` DESC
    ) As Times2
    ON Times1.rownum = Times2.rownum + 1
    JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
    GROUP BY Times1.rownum
) Totals
GROUP BY `Count`

Result:

1, 03:00:00
2, 02:00:00
3, 02:00:00

If this doesn't do what you want, or you want some explanation, please let me know. It could be made faster by storing the repeated subquery AllTimes in a temporary table, but hopefully it runs fast enough as it is.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • On my MySQL server, it wont run unless 'times2.rownum ' is changed to 'Times2.rownum' but otherwise, that's exactly what I was looking for! Works perfectly! Thanks! – maxsilver Jan 25 '10 at 13:15
  • Sorry about that - it was a typo, and it didn't fail for me so I didn't notice it. Fixed now. Glad you got it working! :) – Mark Byers Jan 25 '10 at 13:35
  • So apparently maxsilver ran the query on Linux and Mark on Windows. – Anax Feb 10 '10 at 12:12
0

Start with a table that contains a single datetime field as its primary key, and populate that table with every time value you're interested in. A leap years has 527040 minutes (31622400 seconds), so this table might get big if your events span several years.

Now join against this table doing something like

SELECT i.dt as instant, count(*) as events
FROM instant i JOIN event e ON i.dt BETWEEN e.start AND e.end
GROUP BY i.dt
WHERE i.dt BETWEEN ? AND ?

Having an index on instant.dt may let you forgo an ORDER BY.

If events are added infrequently, this may be something you want to precalculate by running the query offline, populating a separate table.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46
-1

I would suggest an in-memory structure that has start-time,end-time,#events... (This is simplified as time(hours), but using unix time gives up to the second accuracy)

For every event, you would insert the new event as-is if there's no overlap, otherwise, find the overlap, and split the event to (up to 3) parts that may be overlapping, With your example data, starting from the first event:

Event 1 starts at 3am and ends at 10am: Just add the event since no overlaps:

    3,10,1

Event 2 starts at 5am and ends at 9am: Overlaps,so split the original, and add the new one with extra "#events"

    3,5,1
    5,9,2
    9,10,1

Event 3 starts at 7am and ends at 9am: also overlaps, do the same with all periods:

    3,5,1
    5,7,2
    7,9,3
    9,10,1

So calculating the overlap hours per #events:

1 event= (5-3)+(10-9)=3 hours
2 events = 7-5 = 2 hours
3 events = 9-7 = 2 hours

It would make sense to run this as a background process if there are many events to compare.

Osama Al-Maadeed
  • 5,654
  • 5
  • 28
  • 48