How can I make an SQL query that returns time differences between checkins and checkouts?

Question

I'm using mysql and I've got a table similar to this one:

id  | user  | task | time                  | checkout
----+-------+------+-----------------------+---------
1   | 1     | 1    | 2014-11-25 17:00:00   | 0
2   | 2     | 2    | 2014-11-25 17:00:00   | 0
3   | 1     | 1    | 2014-11-25 18:00:00   | 1
4   | 1     | 2    | 2014-11-25 19:00:00   | 0
5   | 2     | 2    | 2014-11-25 20:00:00   | 1
6   | 1     | 2    | 2014-11-25 21:00:00   | 1
7   | 1     | 1    | 2014-11-25 21:00:00   | 0
8   | 1     | 1    | 2014-11-25 22:00:00   | 1

id is just an autogenerated primary key, and checkout is 0 if that row registered a user checking in and 1 if the user was checking out from the task.

I would like to know how to make a query that returns how much time has a user spent at each task, that is to say, I want to know the sum of the time differences between the checkout=0 time and the nearest checkout=1 time for each user and task.

Edit: to make things clearer, the results I'd expect from my query would be:

user  | task | SUM(timedifference)
------+------+-----------------
1     | 1    | 02:00:00
1     | 2    | 02:00:00
2     | 2    | 03:00:00

I have tried using SUM(UNIX_TIMESTAMP(time) - UNIX_TIMESTAMP(time)), while grouping by user and task to figure out how much time had elapsed, but I don't know how to make the query only sum the differences between the particular times I want instead of all of them.

Can anybody help? Is this at all possible?

If i understood your problem correct, should the first difference than not be 01:00:00 ? user 1, task 1 checkout 0: 014-11-25 17:00:00 user 1, task 1 checkout 1: 014-11-25 18:00:00 — Dom84, Nov 26 '14 at 08:23
There is no unique identifier to determine a check in and check out period, user and task arnt unique — Matt, Nov 26 '14 at 08:26
Unless I made a mistake, my idea was that user 1 checks into task 1 at 17, then checks out at 18 (1 hour difference) but then checks into task 1 again at 21 and doesn't check out until 22 (1 hour difference again), so user 1 spent a total of 2 hours checked into task 1. — user1090729, Nov 26 '14 at 08:27
As @Matt said, using the current design there is no way to pair the checkins with checkouts. I would store both the checkin time and checkout time in the same row (`INSERT` on checkin, `UPDATE` on checkout). Or if this suits you better, use different tables for checkin (the current table without field `checkout`) and checkout (having `checkin`.`id` as PK and checkout time as `time`; there is no need for the other fields). — axiac, Nov 26 '14 at 08:34
but there is no unique way of pairing check ins to the tasks or users. — Matt, Nov 26 '14 at 08:35
@Matt There isn't one strictly, each check in and check out period lasts from the time there is a check in for that user and task combination until the time there's a check out for that user and task combination.It's a legacy thing, I can't really change the table because other programs depend on this particular behaviour. — user1090729, Nov 26 '14 at 08:36
possible duplicate of [Difference between two dates in MySQL](http://stackoverflow.com/questions/4759248/difference-between-two-dates-in-mysql) — CodeSlayer, Nov 26 '14 at 08:37
@user1090729 unfortunately the only logic in the pairing is in your head, therefore it cannot be done with the current dataset, there is nothing to group them by. — Matt, Nov 26 '14 at 08:40
@Matt Thank you, if I understand you correctly you do not think there's a way to make this query in SQL and I should just code it in the same roundabout way it was originally implemented? — user1090729, Nov 26 '14 at 08:43

Konstantin · Accepted Answer · 2014-11-26T09:42:36.307

As all comments tell you, your current table structure is not ideal. However it's still prossible to pair checkins with checkouts. This is a SQL server implementation but i am sure you can translate it to MySql:

SELECT id
    , user_id
    , task
    , minutes_per_each_task_instance = DATEDIFF(minute, time, (
            SELECT TOP 1 time
            FROM test AS checkout
            WHERE checkin.user_id = checkout.user_id 
                AND checkin.task = checkout.task 
                AND checkin.id < checkout.id 
                AND checkout.checkout = 1
            ))
FROM test AS checkin
WHERE checkin.checkout = 0

Above code works but will become slower and slower as your table starts to grow. After a couple of hundred thousands it will become noticable

I suggest renaming time column to checkin and instead of having checkout boolean field make it datetime, and update record when user checkouts. That way you will have half the number of records and no complex logic for reading or querying

Thank you for your help, this has helped me greatly. Yes indeed, being able to change the table would make my life way easier, there is however somewhat draconian company policy disallowing users from having update privileges for what they claim are auditing purposes. Even if there wasn't, I'm not even allowed create or alter privileges so I wouldn't be able to do it. Sorry to rant, this has me a bit frustrated. I'll edit your post with my mysql translation and mark your answer as accepted. Thanks again. — user1090729, Nov 26 '14 at 10:15

score 2 · Answer 2 · answered Nov 26 '14 at 10:19

You can determine with a ranking method what are the matching check in/ check out records, and calculate time differences between them

In my example new_table is the name of your table


    SELECT n.user, n.task,n.time, n.checkout , 
           CASE WHEN @prev_user = n.user 
                 AND @prev_task = n.task 
                 AND @prev_checkout = 0 
                 AND n.checkout = 1 
                 AND @prev_time IS NOT NULL
                THEN HOUR(TIMEDIFF(n.time, @prev_time)) END AS timediff,
           @prev_time := n.time,
           @prev_user := n.user, 
           @prev_task := n.task, 
           @prev_checkout := n.checkout
      FROM new_table n, 
           (SELECT @prev_user = 0, @prev_task = 0, @prev_checkout = 0, @prev_time = NULL) a
    ORDER BY user, task, `time`

Then sum the time differences (timediff) by wrapping it in another select

    SELECT x.user, x.task, sum(x.timediff) as total
      FROM ( 
            SELECT n.user, n.task,n.time, n.checkout , 
            CASE WHEN @prev_user = n.user 
                  AND @prev_task = n.task 
                  AND @prev_checkout = 0 
                  AND n.checkout = 1 
                  AND @prev_time IS NOT NULL
                 THEN HOUR(TIMEDIFF(n.time, @prev_time)) END AS timediff,
                  @prev_time := n.time,
                  @prev_user := n.user, 
                  @prev_task := n.task, 
                  @prev_checkout := n.checkout
             FROM new_table n,
                  (@prev_user = 0, @prev_task = 0, @prev_checkout = 0, @prev_time = NULL) a
            ORDER BY user, task, `time`
               ) x
      GROUP BY x.user, x.task

It would probably be easier to understand by changing the table structure though. If that is at all possible. Then the SQL wouldn't have to be so complicated and would be more efficient. But to answer your question it is possible. :)

In the above examples, names prefixed with '@' are MySQL variables, you can use the ':=' to set a variable to a value. Cool stuff ay?

Thank you for taking the time to answer, sadly, I had just accepted another answer just seconds prior to seeing yours. Your approach is very interesting! I had no idea these things were possible in mysql, would you say this is more efficient than what was proposed in the accepted answer? — user1090729, Nov 26 '14 at 10:44
His code is Microsoft SQL, change for Limit 1. It could be that it's very efficient if the interpreter does it's job correctly. But you'd have to test it on different size (10K records then 1Mil records) to know which one is actually faster. I think on 1Mil records his code would probably be faster because ordering is not fast in any SQL. But it's really a matter of testing, result may vary. — phil, Nov 26 '14 at 22:27
I would like to point out though that @Konstantin code relys on the fact that primary key describes the order in which events happened. If that is true, he will be fine. But if the system allows backdating task logs, then his results may be incorrect. He may accidently associate Id1 with Id8 and Id3 with Id 7 if the primary keys were in a different order. That is why I would prefer going with an ORDER BY. That way you are sure to associate the checkIn with the closest checkout. And ignore Data Abberations such as 2 subsequent checkins for the same task & user. If that is possible by accident. — phil, Nov 26 '14 at 22:35
Thank you, times are set by the server and so are the ids, so in _theory_ that could never happen. Having said that, in case this became a real problem, would checking whether checkin.time < checkout.time instead of checkin.id < checkout.id (as he did) fix this? — user1090729, Nov 27 '14 at 10:41

score 0 · Answer 3 · answered Nov 26 '14 at 08:52

Select MAX of checkouts and checkins independently, map them based on user and task and calculate the time difference

select user, task, 
SUM(UNIX_TIMESTAMP(checkin.time) - UNIX_TIMESTAMP(checkout.time)) from (
(select user, task, MAX(time) as time
from checkouts
where checkout = 0
group by user, task) checkout
inner join
(select user, task, MAX(time) as time
from checkouts
where checkout = 1
group by user, task) checkin 
on (checkin.time > checkout.time 
and checkin.user = checkout.user 
and checkin.task = checkout.task)) c

score 0 · Answer 4 · answered Nov 26 '14 at 10:34

This should work. Join on the tables and select the minimum times

SELECT 
  `user`,
  `task`,
  SUM(
    UNIX_TIMESTAMP(checkout) - UNIX_TIMESTAMP(checkin)
  ) 
FROM
  (SELECT 
    so1.`user`,
    so1.`task`,
    MIN(so1.`time`) AS checkin,
    MIN(so2.`time`) AS checkout 
  FROM
    so so1 
    INNER JOIN so so2 
      ON (
        so1.`id` = so2.`id` 
        AND so1.`user` = so2.`user` 
        AND so1.`task` = so2.`task` 
        AND so1.`checkout` = 0 
        AND so2.`checkout` = 1 
        AND so1.`time` < so2.`time`
      ) 
  GROUP BY `user`,
    `task`,
    so1.`time`) a 
GROUP BY `user`,
  `task` ;

As others have suggested though, This will not scale too well as it is, you would need to adjust it if it starts handling more data

How can I make an SQL query that returns time differences between checkins and checkouts?

4 Answers4