Problem
Suppose I have this table tab
(fiddle available).
| g | a | b | v |
---------------------
| 1 | 3 | 5 | foo |
| 1 | 4 | 7 | bar |
| 1 | 2 | 9 | baz |
| 2 | 1 | 1 | dog |
| 2 | 5 | 2 | cat |
| 2 | 5 | 3 | horse |
| 2 | 3 | 8 | pig |
I'm grouping rows by g
, and for each group I want one value from column v
. However, I don't want any value, but I want the value from the row with maximal a
, and from all of those, the one with maximal b
. In other words, my result should be
| 1 | bar |
| 2 | horse |
Current solution
I know of a query to achieve this:
SELECT grps.g,
(SELECT v FROM tab
WHERE g = grps.g
ORDER BY a DESC, b DESC
LIMIT 1) AS r
FROM (SELECT DISTINCT g FROM tab) grps
Question
But I consider this query rather ugly. Mostly because it uses a dependant subquery, which feels like a real performance killer. So I wonder whether there is an easier solution to this problem.
Expected answers
The most likely answer I expect to this question would be some kind of add-on or patch for MySQL (or MariaDB) which does provide a feature for this. But I'll welcome other useful inspirations as well. Anything which works without a dependent subquery would qualify as an answer.
If your solution only works for a single ordering column, i.e. couldn't distinguish between cat
and horse
, feel free to suggest that answer as well as I expect it to be still useful to the majority of use cases. For example, 100*a+b
would be a likely way to order the above data by both columns while still using only a single expression.
I have a few pretty hackish solutions in mind, and might add them after a while, but I'll first look and see whether some nice new ones pour in first.
Benchmark results
As it is pretty hard to compare the various answers just by looking at them, I've run some benchmarks on them. This was run on my own desktop, using MySQL 5.1. The numbers won't compare to any other system, only to one another. You probably should be doing your own tests with your real-life data if performance is crucial to your application. When new answers come in, I might add them to my script, and re-run all the tests.
- 100,000 items, 1,000 groups to choose from, InnoDb:
- 0.166s for MvG (from question)
- 0.520s for RichardTheKiwi
- 2.199s for xdazz
- 19.24s for Dems (sequential sub-queries)
- 48.72s for acatt
- 100,000 items, 50,000 groups to choose from, InnoDb:
- 0.356s for xdazz
- 0.640s for RichardTheKiwi
- 0.764s for MvG (from question)
- 51.50s for acatt
- too long for Dems (sequential sub-queries)
- 100,000 items, 100 groups to choose from, InnoDb:
- 0.163s for MvG (from question)
- 0.523s for RichardTheKiwi
- 2.072s for Dems (sequential sub-queries)
- 17.78s for xdazz
- 49.85s for acatt
So it seems that my own solution so far isn't all that bad, even with the dependent subquery. Surprisingly, the solution by acatt, which uses a dependent subquery as well and which I therefore would have considered about the same, performs much worse. Probably something the MySQL optimizer can't cope with. The solution RichardTheKiwi proposed seems to have good overall performance as well. The other two solutions heavily depend on the structure of the data. With many groups small groups, xdazz' approach outperforms all other, whereas the solution by Dems performs best (though still not exceptionally good) for few large groups.