MySQL SELECT with GROUP BY one field and ORDER BY a different field

Question

EDIT 3 - MySQL version is 8.0.33.

EDIT 2 - See final working code at the bottom. Thanks @Akina !

I have a table of scores for a sporting event. The table has three relevant fields that I want to select -

scoreID as a primary key value
classifierID that maps to the primary key of another table which has details about particular course layouts
calculatedPercent that is the outcome of a particular event

The table also has three other fields that I use in a WHERE clause but those are incidental.

I need to generate a query that selects the four best values for calculatedPercent with the stipulation that no classifierID can be duplicated. I need to be able to capture the scoreID for use in a later stage of the process.

Here was my first query:

SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, `masterScores`.`calculatedPercent` 
FROM `masterScores` 
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE" 
ORDER BY `masterScores`.`calculatedPercent` DESC LIMIT 4

Initially I thought that this was great as it did select the scoreID value for the rows that had the highest calculatedPercent values for a given member. Then I noticed that several members had their first and second highest scores on the same course, which violates the requirement that classifierID values not be duplicated.

I played around with SELECT DISTINCT for a bit but eventually realized that what I really needed was GROUP BY so I did some research and found Error related to only_full_group_by when executing a query in MySql but that didn't completely solve my issue.

I next tried:

SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent 
FROM `masterScores` 
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE" 
GROUP BY `masterScores`.`classifierID` 
ORDER BY bestPercent DESC LIMIT 4

This me the following error message:

#1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column '.masterScores.calculatedPercent' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I considered using MIN and MAX for the masterScores.scoreID column but it wasn't matching expectations; the scoreID primary key value wasn't always the one for the selected calculatedPercent. I read somewhere that, because scoreID is a primary key, I could fix this by using the ANY_VALUE aggregate. I tried this:

SELECT ANY_VALUE(`masterScores`.`scoreID`), `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent 
FROM `masterScores` 
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE" 
GROUP BY `masterScores`.`classifierID` 
ORDER BY bestPercent DESC LIMIT 4

At first glance, this did seem to be working but it wasn't consistently returning scoreID values that matched the bestPercent values.

Again, the objective is to:

Only select 1 calculatedPercent and 1 scoreID value for each classifierID, subject to the indicated WHERE clauses. If not grouped by classifierID there could be anywhere from 0 to 400 rows for each classifierID that satisfy the WHERE clause so I do think GROUP BY is appropriate here.
Ensure that the selected calculatedPercent for each of the grouped classifierID is the highest numerical value among all options
Ensure that only 4 rows are selected and that they are the rows with for which the selected calculatedPercent value is highest.
Ensure that the 4 selected rows are ranked in descending order according to the calculatedPercent values.
Ensure that the scoreID value for each of the selected rows is actually representing the same row as the selected calculatedPercent (currently, this is the point at which my query is failing).

Here's a subset of data, as a example:

scoreID	classifierID	bestPercent
58007	42	66.60
63882	42	64.69
64685	54	64.31
58533	32	63.20
55867	42	62.28
66649	7	56.79
55392	12	50.28
58226	1	49.52
55349	7	41.10

Here's the desired output when I run the query:

scoreID	classifierID	bestPercent
58007	42	66.60
64685	54	64.31
58533	32	63.20
66649	7	56.79

Here's the actual output when I run the query:

scoreID	classifierID	bestPercent
55867	42	66.60
64685	54	64.31
58533	32	63.20
55349	7	56.79

As shown, the scoreID values for the first and fourth rows of actual output are not correct.

At this point, I welcome any advice.

EDIT 2 - Final working solution

WITH cte AS (
    SELECT scoreID, classifierID, calculatedPercent AS bestPercent,
           ROW_NUMBER() OVER (PARTITION BY classifierID ORDER BY calculatedPercent DESC, scoreID DESC) AS rn
    FROM masterScores WHERE memberID = 3516 AND eventDivision = "O" AND scoreUnusable != "TRUE"
)
SELECT scoreID, classifierID, bestPercent
FROM cte
WHERE rn = 1
ORDER BY bestPercent DESC
LIMIT 4

I was able to test this against a half-dozen cases that were giving problems and this solution solved every one of them. Again, thanks @Akina !

Going to mark this solved.

Not sure why the tables are not appearing correctly. They are appearing correctly in the preview and when went to edit the post they also still appeared correctly in that preview also. ?? — Eric Brockway, Jun 20 '23 at 15:35
*why the tables are not appearing correctly* Empty line before needed. — Akina, Jun 20 '23 at 15:36
You must get `classifierID` and `bestPercent` in subquery then use these values as a criteria for retrieving complete row from another table copy. Also - when more than one row have the same `MAX(calculatedPercent)` then you'll receive more than one output row. — Akina, Jun 20 '23 at 15:38
@Akina - Thanks, got the tables to appear. Regarding the subquery, can you provide an example for me of what you mean? Something like SELECT inside another SELECT ? — Eric Brockway, Jun 20 '23 at 15:39
@easleyfixed *I suspect if you changed that setting* Do this never! Non-deterministic synthactically incorrect query cannot be a solution. — Akina, Jun 20 '23 at 15:41
*```AND `masterScores`.`scoreUnusable` != "TRUE"```* - does this column is really string-type one and contains not boolean value but N-char string literals? — Akina, Jun 20 '23 at 15:43
@Akina - Sadly, yes. I inherited this from someone that viewed it as a hobby and it was full of worse things than this. They had all their dates as strings. Even worse, because it was international, they mixed formats dd-mm-yyyy with mm-dd-yyyy depending on the member's country. I cleaned up most of it but it's still not perfect. Will convert to Boolean later. — Eric Brockway, Jun 20 '23 at 15:50
Your MySQL version is not 8.1. There is no such version yet. The most recent release as of today is 8.0.33. Please run this query `SELECT VERSION();` and post the true version. — Bill Karwin, Jun 20 '23 at 20:03
@BillKarwin - Sorry about that. It's 8.0.33. The PHP version on the server is 8.1; I got mixed up. Thanks for catching my mistake. — Eric Brockway, Jun 21 '23 at 21:44

Akina · Accepted Answer · 2023-06-20T16:55:13.940

1

SELECT t1.scoreID, classifierID, calculatedPercent AS bestPercent 
FROM masterScores t1
NATURAL JOIN (
    SELECT classifierID, MAX(calculatedPercent) AS calculatedPercent
    FROM masterScores t2
    WHERE memberID = 3516 AND eventDivision = "O" AND scoreUnusable != "TRUE" 
    GROUP BY 1
    ORDER BY calculatedPercent DESC LIMIT 4
    ) t2

If (classifierID,calculatedPercent) is not unique then you may receive more than one row per classifierID. In this case you need in

SELECT MAX(t1.scoreID) AS scoreID, classifierID, calculatedPercent AS bestPercent 
FROM masterScores t1
NATURAL JOIN (
    SELECT classifierID, MAX(calculatedPercent) AS calculatedPercent
    FROM masterScores t2
    WHERE memberID = 3516 AND eventDivision = "O" AND scoreUnusable != "TRUE" 
    GROUP BY 1
    ORDER BY calculatedPercent DESC LIMIT 4
    ) t2
GROUP BY 2, 3

PS. If your MySQL version is 8+ then you must use ROW_NUMBER() in CTE instead of subquery.

EDIT - MySQL version is 8.1.

WITH cte AS (
    SELECT scoreID, classifierID, calculatedPercent AS bestPercent,
           ROW_NUMBER() OVER (PARTITION BY classifierID ORDER BY calculatedPercent DESC, scoreID DESC) AS rn
    FROM masterScores 
)
SELECT scoreID, classifierID, bestPercent
FROM cte
WHERE rn = 1

edited Jun 20 '23 at 16:55

answered Jun 20 '23 at 15:48

Akina

39,301
5
14
25

I had to add quotes around the O for the eventDivision. When I ran it then, it threw this error: #1054 - Unknown column 'bestPercent' in 'order clause' I tried changing the ORDER BY to calculated Percent and it would run, but then it selected the wrong `scoreID` values for classifers 32 and 7 – Eric Brockway Jun 20 '23 at 15:58
I believe the MySQL version is 8.1. I will edit my original question to add that detail. Thanks. Can you explain your ROW_NUMBER() comment? How would I do this in your example? – Eric Brockway Jun 20 '23 at 16:03
is this relevant to your comment about ROW_NUMBER() ? c.f. https://stackoverflow.com/questions/37069387/is-sql-row-number-order-guaranteed-when-a-cte-is-referenced-many-times – Eric Brockway Jun 20 '23 at 16:54
@EricBrockway *it threw this error* Fixed, test. – Akina Jun 20 '23 at 16:55
it shows the correct `scoreID` for `classifierID` 42 and 54 but it shows `scoreID` 59939 for `classifierID` 32 when it should be 58533 and it shows `scoreID` 68164 for `classifierID` 7 when it should be 66649. – Eric Brockway Jun 20 '23 at 16:59
@EricBrockway *Can you explain your ROW_NUMBER() comment?* Added. – Akina Jun 20 '23 at 17:00
@EricBrockway *but it shows* [Why should I provide a Minimal Reproducible Example for a very simple SQL query?](https://meta.stackoverflow.com/q/333952) then [Tips for asking a good Structured Query Language (SQL) question](//meta.stackoverflow.com/questions/271055), #5 and #3. – Akina Jun 20 '23 at 17:01
I'm sorry, I feel like I'm missing something. You linked a reference that I should provide a minimal reproducible example and I provided a table of data to reference, an example of desired output, and the output that I'm actually seeing. Isn't this exactly what is meant by a minimal reproducible example? Do you want a longer sample of the data set? Additional columns of data that are used in the WHERE clause but which aren't relevant to the output? I am grateful for your time and advice but I don't understand how I've failed to plainly and simply state my objectives... ?? – Eric Brockway Jun 20 '23 at 17:06
@EricBrockway Read 2nd link, specified paragraphs. You'd provide not tables but SQL-scripts which allows to REPRODUCE your tables and data, and test the solution. Accordingly the desired output must MATCH provided source data - for checking purposes. – Akina Jun 20 '23 at 17:15
Thank you for your time and advice. I was able to modify the CTE example that you provided and it solved everything. Thank you!! – Eric Brockway Jun 20 '23 at 17:25

MySQL SELECT with GROUP BY one field and ORDER BY a different field

1 Answers1