How do I pair rows together in MYSQL?

Question

I'm working on a simple time tracking app.

I've created a table that logs the IN and OUT times of employees.

Here is an example of how my data currently looks:

E_ID | In_Out |      Date_Time
------------------------------------
  3  |   I    | 2012-08-19 15:41:52
  3  |   O    | 2012-08-19 17:30:22
  1  |   I    | 2012-08-19 18:51:11
  3  |   I    | 2012-08-19 18:55:52
  1  |   O    | 2012-08-19 20:41:52
  3  |   O    | 2012-08-19 21:50:30

Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:

E_ID |       In_Time       |      Out_Time
------------------------------------------------
  3  | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
  3  | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
  1  | 2012-08-19 18:51:11 | 2012-08-19 20:41:52

I hope I'm being clear in what I'm trying to achieve here. Basically I want to generate a report that had both the in and out time merged into one row.

Any help with this would be greatly appreciated. Thanks in advance.

I'm a web designer, I'm just starting to get my feet wet in MySQL and am trying to create a simple time keeping app as a project. — patskot, Aug 24 '12 at 21:08
@Patrick so you want to see all in/out for each employee even if there are multiple? — Taryn, Aug 24 '12 at 21:10
@bluefeet yes, I want each In and its corresponding Out to be paired. — patskot, Aug 24 '12 at 21:12
Why not in this case keep it simple and just have In_Time and Out_Time fields in the table, and ditch the In_Out flag field? Default Out_Time to NULL. — Paul McNett, Aug 24 '12 at 21:15
Do these tables not have some other column? How do you know which "in time" goes with which "out time"? This schema makes things complex in totally unnecessary ways. — Dan Grossman, Aug 24 '12 at 21:17
@PaulMcNett so your suggesting (and excuse my ignorance if I'm completely ofd the mark) when an employee clocks in it creates a new entry with the In_Time populated and the out_time set to null. Then when the employee clocks out the row/entry just gets updated with the out_time? — patskot, Aug 24 '12 at 21:21
Exactly so. You can add safety checks (you can't clock out if there is no row with a suitable datetime and a NULL exit time, you can't clock in if there is an outstanding row with NULL exit time. Or better, you can, but the anomaly should get logged. Can't have employees exiting from the bathroom window in the back). — LSerni, Aug 24 '12 at 21:23
@Patrick are you expecting there to be more than 2 in/out entries per employee? — Taryn, Aug 24 '12 at 21:23
@bluefeet yes, this would generate a weekly report with multiple instances of an employee clocking in and out. — patskot, Aug 24 '12 at 21:28

spencer7593 · Accepted Answer · 2012-08-24T22:10:08.623

There are three basic approaches I can think of.

One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.

theta-JOIN

One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.

N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.

SELECT o.e_id, MAX(i.date_time) AS in_time, o.date_time AS out_time    
  FROM e `o`
  LEFT
  JOIN e `i` ON i.e_id = o.e_id AND i.date_time < o.date_time AND i.in_out = 'I'
 WHERE o.in_out = 'O'
 GROUP BY o.e_id, o.date_time
 ORDER BY o.date_time

What this does is match every 'O' row for an employee with every 'I' row that is earlier, and then we use the MAX aggregate to pick out the 'I' record with the closest date time.

This works for perfectly paired data; could produce odd results for imperfect pairs... (two consecutive 'O' records with no intermediate 'I' row, will both get matched to the same 'I' row, etc.)

correlated subquery in SELECT list

Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set... this approach works best when we have a limited number of rows returned in the outer query.)

 SELECT o.e_id
      , (SELECT MAX(i.date_time)
           FROM e `i`
          WHERE i.in_out = 'I'
            AND i.e_id = o.e_id
            AND i.date_time < o.date_time
        ) AS in_time
      , o.date_time AS out_time
   FROM e `o`
  WHERE o.in_out = 'O'
  ORDER BY o.date_time

User variables

Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the "missing" analytic functions.)

What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an 'O' (out) row, we use the value of date_time from the immediately preceding 'I' row as the 'in_time')

N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or "derived tables", in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.

SELECT c.e_id
     , CAST(c.in_time AS DATETIME) AS in_time
     , c.out_time
  FROM (
         SELECT IF(@prev_e_id = d.e_id,@in_time,@in_time:=NULL) AS reset_in_time
              , @in_time := IF(d.in_out = 'I',d.date_time,@in_time) AS in_time
              , IF(d.in_out = 'O',d.date_time,NULL) AS out_time
              , @prev_e_id := d.e_id  AS e_id
           FROM (
                  SELECT e_id, date_time, in_out 
                    FROM e
                    JOIN (SELECT @prev_e_id := NULL, @in_time := NULL) f
                   ORDER BY e_id, date_time, in_out 
                 ) d
       ) c
 WHERE c.out_time IS NOT NULL
 ORDER BY c.out_time

This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two 'O' rows with no 'I' row between them, an 'I' row with no subsequent 'O' row, etc.)

SQL Fiddle

Taryn · Answer 2 · 2012-11-23T15:50:46.333

Unfortunately, MySQL doesn't have ROW_NUMBER() OVER(PARTITION BY ORDER BY() function like SQL Server or this would be incredibly easy.

But, there is a way to do this in MySQL:

set @num := 0, @in_out := '';

select emp_in.id,
  emp_in.in_time,
  emp_out.out_time
from 
(
  select id, in_out, date_time in_time, 
     @num := if(@in_out = in_out, @num + 1, 1) as row_number,
     @in_out := in_out as dummy
  from mytable
  where in_out = 'I'
  order by date_time, id
) emp_in
join
(
  select id, in_out, date_time out_time,
     @num := if(@in_out = in_out, @num + 1, 1) as row_number,
     @in_out := in_out as dummy
  from mytable
  where in_out = 'O'
  order by date_time, id
) emp_out
  on emp_in.id = emp_out.id
  and emp_in.row_number = emp_out.row_number
order by emp_in.id, emp_in.in_time

Basically, this creates two sub-queries each one generates a row_number for that particular record - one subquery is for in_time and the other is for out_time.

Then you JOIN the two queries together on the emp_id and the row_number

See SQL Fiddle with Demo

How do I pair rows together in MYSQL?

2 Answers2

Linked