MYSQL to order before grouping by

Question

I have the following:

user_id   date_created   project_id
3         10/10/2013     1
3         09/10/2013     1
5         10/10/2013     1
8         10/10/2013     1
10        10/10/2013     1
3         08/10/2013     1

The end result i want is:

user_id   date_created   project_id
3         10/10/2013     1
5         10/10/2013     1
8         10/10/2013     1
10        10/10/2013     1

Context:

I have this thing called an influence, and a user can have many influences for a project. I want to get a list of the latest influence from a user on a project.

I have tried:

select * from influences 
where project_id = 1 
group by user_id
ORDER BY created_at DESC

but of course this ignores first ordering by user created at, and then ordering the full list. It simply just squishes the users together and orders the end list

THE LARAVEL - Eloquent FOR THE ANSWER PROVIDED IS THIS:

return Influence::select( "user_id",  "influence", DB::raw( "MAX(created_at) as created_at" ) )
                ->where( "project_id", "=", $projectID )
                ->groupBy( "user_id", "project_id" )->get();

This is the most frequently asked question under the mysql tag. Just sayin' — Strawberry, Jan 12 '15 at 12:07
This is the way `GROUP BY` works. Use the MAX() aggregate value to get the most recent `date_created`. — axiac, Jan 12 '15 at 12:31

score 4 · Accepted Answer · answered Jan 12 '15 at 12:07

4

You don't want to order before group by, because given the structure of your query, it won't necessary do what you want.

If you want the most recently created influence, then get it explicitly:

select i.*
from influences i join
     (select user_id, max(created_at) as maxca
      from influences i
      where project_id = 1
      group by user_id
     ) iu
     on iu.user_id = i.user_id and iu.maxca = i.created_at
where i.project_id = 1;

Your intention is to use a MySQL extension that the documentation explicitly warns against using. You want to include columns in the select that are not in the group by. As the documentation says:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

answered Jan 12 '15 at 12:07

Gordon Linoff

1,242,037
58
646
786

Cheers man! I actually didn't realise you would do a select on a join... Very helpful and it works great! – Jimmyt1988 Jan 12 '15 at 12:10
Is there a reason for the downvote? This appears to be a correct answer. – Gordon Linoff Jan 12 '15 at 12:32
I am the downvoter. Is there a reason for sub-queries? – axiac Jan 12 '15 at 12:53
@axiac . . . What do you mean? You need the subquery to get the most recent `created_at` date for each user. This allows the query to pull all the columns from `influences`. – Gordon Linoff Jan 12 '15 at 13:14
In the example code, all the columns are `user_id` (included in `GROUP BY`), `project_id` (always `1`, filtered in the `WHERE` clause) and `MAX(created_at)`. A simple `SELECT` with `GROUP BY` is enough. If there are other fields (the OP didn't mention them yet) then the query is different but it still doesn't need inner queries (and even `GROUP BY` becomes useless). – axiac Jan 12 '15 at 13:20
@axiac . . . This is a more general solution. – Gordon Linoff Jan 12 '15 at 13:20

axiac · Answer 2 · 2015-01-12T12:53:41.670

Use this:

SELECT user_id, project_id, MAX(date_created) as latest
FROM influences
WHERE project_id = 1
GROUP BY user_id, project_id

How it works: MySQL selects all the rows that match the WHERE conditions and sorts them by user_id then, for each user_id by project_id. From each set of rows having the same user_id and project_id it will produce a single row in the final result set.

You can use in the SELECT clause the columns used in the GROUP BY clause (user_id and project_id); their values are unambiguous: all the rows from each group have the same user_id and project_id.

You can also use aggregate functions. Each of them uses one column from all the rows in the group to compute a single value. The most recent created_at is, of course, MAX(created_at).

If you select a column that is neither included in the GROUP BY clause, nor passed to an aggregate function (like created_at you have in your query), MySQL has no hint how to compute that value. The standard SQL forbids it (the query is not valid) but MySQL allows it. It will simply pick a value from that column but there is no way to make it pick it from a specific row because this is, in fact, undefined behaviour.

You can omit the project_id from the GROUP BY clause because the WHERE clause will make all the rows having the same project_id. This will coincidentally make the result correct even if project_id does not appear in a GROUP BY clause and it's not computed using an aggregate function.

I recommend you to keep project_id into the GROUP BY clause. It doesn't affect the result or the query speed and it allows you to loose the filtering conditions (f.e. use WHERE project_id IN (1, 2)) always get the correct result (this doesn't happen if you remove it from GROUP BY).

This answer only handles the columns displayed in the example code from the question. If the table `influences` contains other columns (not displayed here) then this answer doesn't work. The correct answer in that case can be found on the dozens of similar questions on SO. — axiac, Jan 12 '15 at 13:25
I've just realised that the outcome isn't correct. It shuffles the columns around and it's difficult to tell why... however the other one definitely works... If you include the SELECT to have "influence" the influence is being assigned to the wrong user — Jimmyt1988, Jan 12 '15 at 16:57
As I said in my previous comment on this answer, it works only if the table contains just the three columns exposed in the question. It also provides correct results if any other columns that are selected in the database have the same value on all the rows that have the same `user_id`. — axiac, Jan 12 '15 at 17:22
If you need to select more columns then you need a different query. Check [this answer](http://stackoverflow.com/questions/27183964/sql-sorting-does-not-follow-group-by-statement-always-uses-primary-key/27184343#27184343), [this one](http://stackoverflow.com/questions/27801243/how-to-select-a-row-with-maximum-value-for-a-column-in-mysql/27801835#27801835) and [this one](http://stackoverflow.com/questions/612231/how-can-i-select-rows-with-maxcolumn-value-distinct-by-another-column-in-sql/27802817#27802817) on questions similar with yours. Add the `WHERE product_id = 1` condition to them. — axiac, Jan 12 '15 at 17:26

MYSQL to order before grouping by

2 Answers2