5

This seemed pretty straight forward initially, but has proved to be a real headache. Below is my table, data, expected output and SQL Fiddle of where I have got to in solving my problem.

Schema & Data:

CREATE TABLE IF NOT EXISTS `meetings` (
  `id` int(6) unsigned NOT NULL,
  `user_id` int(6) NOT NULL,
  `start_time` DATETIME,
  `end_time` DATETIME,
  PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `meetings` (`id`, `user_id`, `start_time`, `end_time`) VALUES
  ('0', '1', '2018-05-09 04:30:00', '2018-05-09 17:30:00'),
  ('1', '1', '2018-05-10 06:30:00', '2018-05-10 17:30:00'),
  ('2', '1', '2018-05-10 12:30:00', '2018-05-10 16:00:00'),
  ('3', '1', '2018-05-11 17:00:00', '2018-05-12 11:00:00'),
  ('4', '2', '2018-05-11 07:00:00', '2018-05-12 11:00:00'),
  ('5', '2', '2018-05-11 04:30:00', '2018-05-11 15:00:00');

What I would like to get from the above is total time worked outside of 09:00 to 17:00, grouped by day and user_id. So the result from the above data would look like:

  date        | user_id | overtime_hours
  ---------------------------------------
  2018-05-09  | 1       | 05:00:00
  2018-05-10  | 1       | 03:00:00
  2018-05-11  | 1       | 07:00:00
  2018-05-12  | 1       | 09:00:00
  2018-05-11  | 2       | 13:30:00
  2018-05-12  | 2       | 09:00:00

As you can see the expected results are only summing overtime for each day and user for those hours outside of 9 to 5.

Below is the query and SQL Fiddle of where I am. The main issue comes when the start and ends straddle midnight (or multiple midnight's)

SELECT
    SEC_TO_TIME(SUM(TIME_TO_SEC(TIME(end_time)) - TIME_TO_SEC(TIME(start_time)))), user_id, DATE(start_time)
FROM
(SELECT 
    start_time, CASE WHEN TIME(end_time) > '09:00:00' THEN DATE_ADD(DATE(end_time), INTERVAL 9 HOUR) ELSE end_time END AS end_time, user_id
FROM
    meetings
WHERE
    TIME(start_time) < '09:00:00'

UNION

SELECT 
    CASE WHEN TIME(start_time) < '17:00:00' THEN DATE_ADD(DATE(start_time), INTERVAL 17 HOUR) ELSE start_time END AS start_time, end_time, user_id
FROM
    meetings
WHERE
    TIME(end_time) > '17:00:00') AS clamped_times
GROUP BY user_id, DATE(start_time)

http://sqlfiddle.com/#!9/77bc85/1

Pastebin for when the fiddle decides to flake: https://pastebin.com/1YvLaKbT

As you can see the query grabs the easy overtime with start and ends on the same day, but does not work with the multiple day ones.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
superphonic
  • 7,954
  • 6
  • 30
  • 63
  • 6
    If only every question could be this well presented. – Strawberry Nov 19 '18 at 16:28
  • Sounds like this could be accomplished by sorting on both day and user_id – Dom Nov 19 '18 at 16:30
  • The fiddle isn't loading at the moment, so maybe you already have this, but if not, here's where I would start... `SELECT id,TIMEDIFF(DATE_FORMAT(start_time, '%Y-%m-%d 09:00:00'),start_time) n FROM meetings;` – Strawberry Nov 19 '18 at 16:34
  • @Strawberry Yeah sqlfiddle has been playing up today something chronic. Try and let it time out and then reload the page. I don't want to bloat the question with the query. I'll add a paste bin link to it. – superphonic Nov 19 '18 at 16:38
  • Cheers @Strawberry – superphonic Nov 19 '18 at 16:44
  • I think you just need to think about your CASE statements a little more carefully. – Strawberry Nov 19 '18 at 16:46
  • 1
    @superphonic - sqlfiddle was messing up the other day as well when I tried it. Maybe their database has gotten corrupted ;). – dcp Nov 19 '18 at 16:46

1 Answers1

2

If the meeting is going to span across n days, and you are looking to compute "work hours" daywise within a particular meeting; it rings a bell, that we can use a number generator table.

(SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen

We will use the number generator table to consider separate rows for the individual dates ranging from the start_time to end_time. For this case, I have assumed that it is unlikely that meeting will span across more than 2 days. If it happens to span more number of days, you can easily extend the range by adding more UNION ALL SELECT 3 .. to the ngen Derived Table.

Based on this, we will determine "start time" and "end time" to consider for a specific "work date" in an ongoing meeting. This calculation is being done in a Derived Table, for a grouping of user_id and "work date".

Afterwards, we can SUM() up "working hours" per day of a user using some maths. Please find the query below. I have added extensive comments to it; do let me know if anything is still unclear.


Demo on DB Fiddle

Query #1

SELECT 
  dt.user_id, 
  dt.wd AS date, 

  SEC_TO_TIME(SUM(

      CASE 
        /*When both start & end times are less than 9am OR more than 5pm*/
        WHEN (st < TIME_TO_SEC('09:00:00') AND et < TIME_TO_SEC('09:00:00')) OR 
             (st > TIME_TO_SEC('17:00:00') AND et > TIME_TO_SEC('17:00:00'))
        THEN et - st  /* straightforward difference between the two times */

        /* atleast one of the times is in 9am-5pm block, OR, 
           start < 9 am and end > 5pm.
           Math of this can be worked out based on signum function */
        ELSE GREATEST(0, TIME_TO_SEC('09:00:00') - st) + 
             GREATEST(0, et - TIME_TO_SEC('17:00:00'))

      END
  )) AS working_hours  

FROM 
(

 SELECT 
   m.user_id, 

   /* Specific work date */
   DATE(m.start_time) + INTERVAL ngen.gap DAY AS wd, 

   /* Start time to consider for this work date */
   /* If the work date is on the same date as the actual start time
      we consider this time */
   CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.start_time) 
             THEN TIME_TO_SEC(TIME(m.start_time))

        /* We are on the days after the start day */
        ELSE 0  /* 0 seconds (start of the day) */
   END AS st, 

   /* End time to consider for this work date */
   /* If the work date is on the same date as the actual end time
      we consider this time */
   CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.end_time) 
             THEN TIME_TO_SEC(TIME(m.end_time)) 

        /* More days to come still for this meeting, 
           we consider the end of this day as end time */
        ELSE 86400  /* 24 hours * 3600 seconds (end of the day) */
   END AS et

 FROM meetings AS m 
 JOIN (SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen
   ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)

) AS dt 
GROUP BY dt.user_id, dt.wd;

Result

| user_id | date       | working_hours |
| ------- | ---------- | ------------- |
| 1       | 2018-05-09 | 05:00:00      |
| 1       | 2018-05-10 | 03:00:00      |
| 1       | 2018-05-11 | 07:00:00      |
| 1       | 2018-05-12 | 09:00:00      |
| 2       | 2018-05-11 | 13:30:00      |
| 2       | 2018-05-12 | 09:00:00      |

Further Optimization Possibilities:

  1. This query can do away with the usage of subquery (Derived Table) very easily. I just wrote it in this way, to convey the mathematics and process in a followable manner. However, you can easily merge the two SELECT blocks to a single query.
  2. Maybe, more optimization possible in usage of Date/Time functions, as well as further simplification of mathematics in it. Function details available at: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
  3. Some date calculations are done multiple times, e.g., DATE(m.start_time) + INTERVAL ngen.gap DAY. To avoid recalculation, we can utilize User-defined variables, which will also make the query less verbose.
  4. Make this JOIN condition sargable: JOIN .. ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
  • Perfect answer. I also appreciate the extra effort of the of the commenting and explanation. Thanks – superphonic Nov 26 '18 at 11:39
  • 1
    @superphonic thanks. This query can be condensed further, as mentioned by me in "Further Optimization Possibilities" at the end. I have left that part to you; if you still need assistance, do let me know. – Madhur Bhaiya Nov 26 '18 at 11:40