0

I would like to perform a query that will select only the most recent item from a given group.


In this example, I'm tracking vans:

  • Each time they return to base, a check-in is recorded with information - mileage, etc...
  • Each time they make a delivery, a delivery is recorded - customer, etc...

This table lets us know the history for a given van. The data can be produced with a query or stored as we go - this isn't the problem.

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 24 | 15         | NULL        | 3
 25 | NULL       | 28          | 3
 26 | 16         | NULL        | 4
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I can see the van's history by querying with ... WHERE van_id=3; - fine.

Conversely, I would like to be able to get a list of vans with their most recent "event". Resulting in this:

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I jumped to the following query:

SELECT * FROM `history` GROUP BY `van_id`;

But this returns the following error:

#1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'database.history.checkin_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

After reading up, I understand what this is about and have to admit that my SQL is somewhat out of date - which of the items from the group do I want returned?

Adding checkin_id and delivery_id to the GROUP BY just shifts the problem - Ultimately I end up with the same set of data, just sorted differently.


This answer piqued my interest, and the graphic really helps to clearly outline the problem, thanks @azerafati!

I want to use the FIRST() or LAST() aggregate function - but MySQL doesn't appear to have them.

How do I reproduce this behaviour without processing all of the data in my application?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Attie
  • 6,690
  • 2
  • 24
  • 34
  • Learn to use proper `GROUP BY`. Then you won't have this error. – Eric Jul 20 '18 at 18:17
  • The reason you get that particular error is because recent versions of MySQL default to disallowing it. As far as I know, MySQL is the only RDBMS that allows it at all; the problem is that unless the non-grouped, non-aggregated fields selected are "functionally dependent" on the grouped ones, the values used are not guaranteed to be a specific one encountered (effectively a random selection from those encountered in the process of grouping). Basically, if your configuration allowed it, your query could get a result row like (27, 15, 28, 3). – Uueerdo Jul 20 '18 at 18:22
  • You can tell how your code doesn't make sense. When you grouping by `van_id`, and you don't put aggregation on `checkin_id` and `delivery_id`, how would MySQL know which one to get?? For `van_id 3`, why `checkin_id` is `NULL`, not 15??? Why would `delivery_id` is `29`, not `NULL`??? – Eric Jul 20 '18 at 18:22
  • You are trying to use MySQL's deprecated non-standard extension to `GROUP BY`. Please read [this](https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html). – O. Jones Jul 20 '18 at 20:41

2 Answers2

8

I guess your id values are unique, and later records have higher values than earlier records.

You need to use a subquery that gets the latest id for each van:

         SELECT MAX(id) id, van_id
           FROM history
          GROUP BY van_id

Then join that to your detail query.

 SELECT h.*
   FROM history h
   JOIN (
         SELECT MAX(id) id, van_id
           FROM history
          GROUP BY van_id
        ) m ON h.id = m.id AND h.van_id = m.van_id

But because your id values are unique you can simplify this even more.

 SELECT h.*
   FROM history h
   JOIN (
         SELECT MAX(id) id
           FROM history
          GROUP BY van_id
        ) m ON h.id = m.id
O. Jones
  • 103,626
  • 17
  • 118
  • 172
4

I was going to mark this as a duplicate because the question is actually asked fairly frequently, but I found those question/answers seem fairly hard to search for; so here is the generic template:

SELECT t.*
FROM theTable AS t
INNER JOIN (
   SELECT groupingValue, MIN(someValue) AS lowestValue
   FROM theTable
   GROUP BY groupingValue
) AS rIdent ON rIdent.groupingValue = t.groupingValue AND rIdent.lowestValue= t.someValue

lowest in your particular case being the min(id)...oh, oops; your question says first, but the detail says most recent (which I would interpret as last), so just use MAX instead of MIN. ...and the "groupingValue" is van_id.

Edit: The query should be fairly efficient if there is an index on the grouping fields and the field used to identify lowest/first/highest/recent .

Uueerdo
  • 15,723
  • 1
  • 16
  • 21