How to get my GROUP BY to take the value of the last row (by date) for a specific column?

Question

I have a query which has a GROUP BY clause. This query takes employees "checkin" times from a records table and if the dn field on is 1 then the employee is in the office, but if it's 2, then he/she left.

So lets say I put records.dn in the GROUP BY clause. This results with duplicate entries for each employee (I get their names one after the other in a table, twice).

I tried using MIN and MAX in the select, but that didn't make sense at all.

So from those multiple rows in the database where an employee might checkin and out multiple times, how do I take the LATEST row and use its dn field?

This is the query I'm talking about:

select 
    MAX(records.dn), 
    `records`.`din`, 
    `users`.`username`, 
    `users`.`id` as `user_id`, 
    DATE(records.created_at) AS date, 
    TIME(MIN(records.created_at)) AS first_log, 
    TIME(MAX(records.created_at)) AS last_log 
from `records` 
inner join `users` on `records`.`din` = `users`.`din` 
where records.created_at BETWEEN '2016-09-05 00:00:00' 
    AND '2016-09-08 00:00:00' 
group by `users`.`username`, DATE(records.created_at) 
order by `first_log` asc

(The first select column is what I tried to do, using MAX)

This is the records table with some sample data:

How do I achieve what I'm looking for?

I have been in this situation yesterday. the best solution I have found is in here http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group/1313293#1313293 — Accountant م, Sep 10 '16 at 14:49
What are your Expected Results? A sqlfiddle never breaks anyone's heart :p — Drew, Sep 10 '16 at 15:04
See http://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query — Strawberry, Sep 10 '16 at 15:12

Gordon Linoff · Accepted Answer · 2016-09-10T15:53:46.593

1

The substring_index()/group_concat() trick might do exactly what you want:

select substring_index(group_concat(r.dn order by created_at desc), ',', 1) as last_dn, 
       u.username, u.id as `user_id`, 
       DATE(r.created_at) AS date, 
       TIME(MIN(r.created_at)) AS first_log, 
       TIME(MAX(r.created_at)) AS last_log 
from records r inner join
     users u
     on r.din = u.din 
where r.created_at >= '2016-09-05' and
      r.created_at < '2016-09-08' 
group by u.username, DATE(r.created_at) 
order by first_log asc;

As written, this has three limitations:

It assumes that dn doesn't have a comma in it (that is easily fixed by using a different separator).
It assumes that the list of dns for a user for a day won't exceed the maximum length of a string for group_concat(). That can also be adjusted using a system parameter.
The last_dn is an string, even if that is not the original type.

But this trick works well in most cases. The parameter group_concat_max_len is the size of the intermediate string. Its default (described here) is 1,024. This should be fine for up to a hundred or more integers. It is a session parameter and easily changed.

Also note the following changes:

I think unnecessary backticks in a query just make it harder to read.
The use of table aliases makes a query easier to write and to read.
I don't think you mean between, because it is inclusive. I am guessing that you want date/time values strictly before the 8th.

edited Sep 10 '16 at 15:53

answered Sep 10 '16 at 15:31

Gordon Linoff

1,242,037
58
646
786

Thank you for the answer, I'm gonna try it shortly. Just a questions and clarification before I do that... What the maximum length for `group_concat()`? And this is a Laravel outputted query, so the framework adds the ticks by itself, I just copied it here :). the `between` clause with the dates is also dynamic -- I need it for a daterange picker. – aborted Sep 10 '16 at 15:51
SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ' `records`.`din`, `users`.`username`, `users`.`id` as `user_id`, DATE(records.cr' at line 1 (SQL: select substring_index(group_concat(records.dn order by created_at desc), , `records`.`din`, `users`.`username`, `users`.`id` as `user_id`, DATE(records.created_at) as date, TIME(MIN(records.created_at)) as first_log, TIME(MAX(records.created_at)) – aborted Sep 10 '16 at 16:00
as last_log from `records` inner join `users` on `records`.`din` = `users`.`din` where records.created_at between '2016-09-05 00:00:00' and '2016-09-08 00:00:00' group by `users`.`username`, DATE(records.created_at) order by `first_log` asc) – aborted Sep 10 '16 at 16:00
1

@Aborted . . . Your syntax error does not correspond to the query in my answer. There is no `records.din` in my query. – Gordon Linoff Sep 10 '16 at 16:03
I removed `records.din`, added `substring_index(group_concat(records.dn order by created_at desc), ',', 1) as last_dn` and added ticks to `user_id`. With the way my query is built, I have no way of adding aliases (r and u) for the tables. Would that break the query? – aborted Sep 10 '16 at 16:12
That should be fine. You should remove the `dn` column in the `group by` as well. – Gordon Linoff Sep 10 '16 at 16:23

How to get my GROUP BY to take the value of the last row (by date) for a specific column?

1 Answers1