5

I need to count the total amount of time spent based on a table like:

id | start_time | end_time |

where periods can overlap. I need to count ovelpapping periods only once.

E.g. if I have periods like these:

*----A----*              *------C-----* *----------D----------*
                  *-----B-----*              *---E---*

the sum will be: (A.end-A.start) + (C.end - B.start) + (D.end - D.start)

I'm a bit confused with the approach I should use to write this query and will be grateful for help.

svz
  • 4,516
  • 11
  • 40
  • 66

3 Answers3

2

Ok, I seriously insist you test this in all manners before using it in production. Especially test what happens if there are MULTIPLE overlaps for 1 time span.

What this query does is calculate the duration of each time span, and how much overlap exists with other time spans that have a higher id.

select
    t1.id,
    t1.start_time,
    t1.end_time,
    t1.end_time - t1.start_time as duration,
    sum(
          if(t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  , t1.end_time - t1.start_time, 0) -- t2 completely around t1
        + if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  , t2.end_time - t2.start_time, 0) -- t2 completely within t1
        + if(t2.start_time <  t1.start_time and t2.end_time >  t1.start_time and t2.end_time   < t1.end_time  , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
        + if(t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time   and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
    ) as overlap
from
    times t1
    left join times t2 on
        t2.id > t1.id --  t2.id is greater than t1.id
        and (
               (t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  ) -- t2 completely around t1
            or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  ) -- t2 completely within t1
            or (t2.start_time <  t1.start_time and t2.end_time >  t1.start_time) -- t2 starts before t1 starts and overlaps
            or (t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time  ) -- t2 starts before t1 ends and overlaps
        )
group by
    t1.id

So what you want to have in the end is this:

select
    sum(t.duration) - sum(t.overlap) as filtered_duration
from
    (
        OTHER QUERY HERE
    ) as t

So in the end you have this query:

select
    sum(t.duration) - sum(t.overlap) as filtered_duration
from
    (
        select
            t1.id,
            t1.start_time,
            t1.end_time,
            t1.end_time - t1.start_time as duration,
            sum(
                  if(t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  , t1.end_time - t1.start_time, 0) -- t2 completely around t1
                + if(t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  , t2.end_time - t2.start_time, 0) -- t2 completely within t1
                + if(t2.start_time <  t1.start_time and t2.end_time >  t1.start_time and t2.end_time   < t1.end_time  , t2.end_time - t1.start_time, 0) -- t2 starts before t1 starts and overlaps partially
                + if(t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time   and t2.start_time > t1.start_time, t1.end_time - t2.start_time, 0) -- t2 starts before t1 ends and overlaps partially
            ) as overlap
        from
            times t1
            left join times t2 on
                t2.id > t1.id --  t2.id is greater than t1.id
                and (
                       (t2.start_time <  t1.start_time and t2.end_time >  t1.end_time  ) -- t2 completely around t1
                    or (t2.start_time >= t1.start_time and t2.end_time <= t1.end_time  ) -- t2 completely within t1
                    or (t2.start_time <  t1.start_time and t2.end_time >  t1.start_time) -- t2 starts before t1 starts and overlaps
                    or (t2.start_time <  t1.end_time   and t2.end_time >  t1.end_time  ) -- t2 starts before t1 ends and overlaps
                )
        group by
            t1.id
    ) as t
nl-x
  • 11,762
  • 7
  • 33
  • 61
  • Thanks, I'll give it a try! Will accept it as soon as I make it work. – svz Apr 03 '13 at 10:59
  • @svz I just tested, and it does break when there are 3 time spans active at the same moment, because the overlapped timespan is then substracted twice. – nl-x Apr 03 '13 at 11:02
  • But I think this is as good as it gets though. So I don't know if it is a possibility in your case that you could have 3 time spans simultaneously. – nl-x Apr 03 '13 at 11:36
  • Thanks for help, anyway. I'll try to do something about it. – svz Apr 03 '13 at 11:37
2

I'd like to suggest an other way to get the time, while being sure, that the result is correct. But I don't know, how to get this done with MySQL in complete.

I'll reuse the above example with the following hours - and there might even be a 3rd-level entry "F":

1         3              7           12 13    (15 16)        20
|----A----|              |------C-----| |----------D----------|
                  |-----B-----|              |---E---|
                  5           9              14     17
                                                |F|
  1. Query a combined list of all timestamps ordered by time and add the type of each "action"

    SELECT 1 as onoff, start_time as time FROM table
    UNION
    SELECT -1 as onoff, end_time as time FROM table
    ORDER BY time
    
  2. Process the list by a loop (?) with a temporary counter, which increments by 1 on starts/logins and decrements by 1 on ends/logouts

    The counter should cause the script to add a new row to a temporary table with tmp.start=<time>, if it changes from 0 to 1 and update tmp.end=<time> on the previous row in the temp. table, if it changes from 1 to 0.

    The script would do that for the example above like follows:

    QUERY                       TMP TABLE
    onoff | time  | ctr         ID | start | end
    1     | 01:00 | 1           1  | 01:00 |            (record 1 added,   ctr 0->1)
    -1    | 03:00 | 0           1  | 01:00 | 03:00      (record 1 updated, ctr 1->0)
    1     | 05:00 | 1           2  | 05:00 |            (record 2 added,   ctr 0->1)
    1     | 07:00 | 2                                   (nothing to do)
    -1    | 09:00 | 1                                   (nothing to do)
    -1    | 12:00 | 0           2  | 05:00 | 12:00      (record 2 updated, ctr 1->0)
    1     | 13:00 | 1           3  | 13:00 |            (record 3 added,   ctr 0->1)
    1     | 14:00 | 2                                   (nothing to do)
    1     | 15:00 | 3                                   (nothing to do)
    -1    | 16:00 | 2                                   (nothing to do)
    -1    | 17:00 | 1                                   (nothing to do)
    -1    | 20:00 | 0           3  | 13:00 | 20:00      (record 3 updated, ctr 1->0)
    
  3. The last step is very easy: Get the timestampdiff() from start to end in the unit, you need/like it to be and do any further filtering or grouping.

    For example: To use the data elsewhere

    SELECT ID, start, end, timestampdiff(MINUTE, start, end) FROM tmp
    

    or for example: Sum up worktime / time logged in per user

    SELECT user_id, SUM(timestampdiff(MINUTE, start, end)) FROM tmp GROUP BY user_id
    

I'm sure, this will give the correct duration for any level of nesting, but does anyone know how to accomplish this in MySQL? I'd like to use this, too.

Best regards

PS: The script might also "close" the last session or throw an error, if it ends with counter > 1 and throw an error, if the counter becomes < 0 at any time

1

I wrote a similar query for another question so I thought I'd adjust it for this question incase anyone is interested.

SELECT SUM(a.end_time - a.start_time) total_duration
  FROM (
    SELECT MIN(g.start_time) start_time, MAX(g.end_time) end_time 
      FROM (
        SELECT @group_id := @group_id + (@end_time IS NULL OR o.start_time > @end_time) group_id,
               start_time,
               @end_time := CAST(CASE 
                 WHEN (@end_time IS NULL OR o.start_time > @end_time) THEN o.end_time
                 ELSE GREATEST(o.end_time, @end_time)
               END AS DATETIME) end_time  
          FROM times o
          JOIN (SELECT @group_id := 0, @end_time := NULL) init
      ORDER BY o.start_time ASC  
            ) g
  GROUP BY  g.group_id  
        ) a

The inner-most query groups together your times in overlapping groups stretching the end_time where appropriate. The end_time flexes to deal with times completely enclosed by the previous.

The next wrapping query extracts the full time range from each group.

The outer query sums up the diffs for each group.

Community
  • 1
  • 1
Arth
  • 12,789
  • 5
  • 37
  • 69