Acceptable technique for group-wise maximum in MySQL

Question

As "everyone knows", you can't return non-grouped non-aggregated columns in a GROUP BY, in other words, "give me the ID, name and address of the employee with the highest salary in each department." Of course this isn't quite true: http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html But this contains a rather ominous warning:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

MySQL has another article on this problem: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html But the technique recommended there doesn't actually take advantage at all of hidden columns. There's a comment in that article from Kasey Speakman, who recommends using an ordered subquery, like so:

select deptno, emp_id, address, name from
(select * from emp order by salary desc)
group by deptno

My questions are: a) Can I safely rely on MySQL to pick the "first" row from each group, since the subquery is ordered, and b) in general, and assuming appropriate indexes, is this likely to perform better than, say, the LEFT JOIN technique mentioned in the same article?

score 1 · Accepted Answer · edited May 23 '17 at 10:24

1

There was recently a discussion on a similar question here: SQL: What is the default Order By of queries?

But, nevertheless, I think ranking queries are the example of the queries in MySQL where it is quite useful to rely on the predicted order (predicted by using specific indexes).

Look at my answer to the following question: Retrieving the last record in each group

That is the answers to your questions:

yes, sometimes you can rely on the order when you know the engine and the indexes used, though it is not usually friendly accepted
when there are many items within each group the LEFT JOIN solution might take too long to execute, so that relying on the bare indexes might become almost the only solution. But the solution should not generate huge intermediate temporary tables.

But your query:

select deptno, emp_id, address, name from
(select * from emp order by salary desc)
group by deptno

is the worst possible idea, since it generates an unindexed copy of your table and operates on it making no use of any optimizations.

edited May 23 '17 at 10:24

Community

1
1

answered Jan 10 '12 at 18:58

newtover

31,286
11
84
89

Well, my example was taken from the one in that article, that's why I was asking about it. In my real case the inner query would have a `where` clause that would be selecting a fairly small subset of data using an index. But it's true that the column I'm ordering by (and would be calling `MAX` on in the ordinary case) is not indexed. – Dan Jan 10 '12 at 20:36
@Dan, at best your solution should return the ids of the rows in a subquery, which you should join with the table once again to get the rest of the data. – newtover Jan 10 '12 at 20:42
I see. Your solution looks interesting but I don't quite understand it yet - I've never dealt with variable assignment inside queries before. – Dan Jan 10 '12 at 20:54
@Dan, I have just realized that I have not articulated the main idea there. And your question gave directed my to a simplier solution based on the same idea. I will update my answer there. – newtover Jan 11 '12 at 06:15

Acceptable technique for group-wise maximum in MySQL

1 Answers1