0

I have a pretty complex query (lots of joins) that returns this dataset:

+------------+-----------+-----------+---------------------+---------------------+-----------+------------+
| vehicle_id | pickup_id | driver_id | pickup_time         | dropoff_time        | code      | name       |
+------------+-----------+-----------+---------------------+---------------------+-----------+------------+
|         13 |       154 |        23 | 2021-08-18 22:30:00 | 2021-08-18 23:30:00 | 17526     | Natalie    |
|         13 |        16 |        23 | 2021-08-18 23:35:00 | 2021-08-19 00:35:00 | 17245     | Mark       |
|         12 |      2879 |        15 | 2021-08-19 12:00:00 | 2021-08-19 21:00:00 | 17517     | Will       |
+------------+-----------+-----------+---------------------+---------------------+-----------+------------+

If I add a GROUP BY vehicle_id to this query I get the Mark and Will rows, but I'd expect the Natalie row. Even if I change the inner query

Here is the full query, with the inner simplified a little bit:

SELECT * FROM (
    SELECT * 
    FROM trips 
    WHERE vehicle_id IS NOT NULL 
    ORDER BY pickup_time ASC
) AS tmp 
GROUP BY vehicle_id

Regardless if I do the ORDER BY with ASC or DESC I'm still getting the Mark and Will records. I cannot get the Natalie one.

What am I doing wrong with my query?

Bing
  • 3,071
  • 6
  • 42
  • 81
  • What makes you think `GROUP BY` will always choose the first row in each group from the subquery. – Barmar Aug 18 '21 at 20:46
  • I think [`ORDER BY` in subquery is inconsequential](https://stackoverflow.com/questions/18684901/mysql-subquery-order), I'd recommend using [`RANK()` windowing function](https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html). – orhtej2 Aug 18 '21 at 20:50
  • @Barmar What does it determine to group on, then? I don't see any explanation in the documentation or on Slack. – Bing Aug 18 '21 at 20:54
  • It's not deterministic. If you want a specific row, you need to select it specifically. See the linked question. – Barmar Aug 18 '21 at 21:02
  • The way MySQL is implemented, it will return the first row in the group, i.e. the first row it reads in index order. The index it is reading from may not be the one you expect. But this is not documented behavior, it's just a consequence of the implementation, which might change. So you shouldn't depend on it. You should use SQL logic to ensure the row you want is the one returned. This type of question has been answered many times on Stack Overflow, typically with the [tag:greatest-n-per-group] tag. – Bill Karwin Aug 18 '21 at 21:30

0 Answers0