EDIT 3 - MySQL version is 8.0.33.
EDIT 2 - See final working code at the bottom. Thanks @Akina !
I have a table of scores for a sporting event. The table has three relevant fields that I want to select -
scoreID
as a primary key valueclassifierID
that maps to the primary key of another table which has details about particular course layoutscalculatedPercent
that is the outcome of a particular event
The table also has three other fields that I use in a WHERE clause but those are incidental.
I need to generate a query that selects the four best values for calculatedPercent
with the stipulation that no classifierID
can be duplicated. I need to be able to capture the scoreID
for use in a later stage of the process.
Here was my first query:
SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, `masterScores`.`calculatedPercent`
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
ORDER BY `masterScores`.`calculatedPercent` DESC LIMIT 4
Initially I thought that this was great as it did select the scoreID
value for the rows that had the highest calculatedPercent
values for a given member. Then I noticed that several members had their first and second highest scores on the same course, which violates the requirement that classifierID
values not be duplicated.
I played around with SELECT DISTINCT for a bit but eventually realized that what I really needed was GROUP BY so I did some research and found Error related to only_full_group_by when executing a query in MySql but that didn't completely solve my issue.
I next tried:
SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
GROUP BY `masterScores`.`classifierID`
ORDER BY bestPercent DESC LIMIT 4
This me the following error message:
#1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column '.masterScores.calculatedPercent' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I considered using MIN and MAX for the masterScores
.scoreID
column but it wasn't matching expectations; the scoreID
primary key value wasn't always the one for the selected calculatedPercent
. I read somewhere that, because scoreID
is a primary key, I could fix this by using the ANY_VALUE aggregate. I tried this:
SELECT ANY_VALUE(`masterScores`.`scoreID`), `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
GROUP BY `masterScores`.`classifierID`
ORDER BY bestPercent DESC LIMIT 4
At first glance, this did seem to be working but it wasn't consistently returning scoreID
values that matched the bestPercent values.
Again, the objective is to:
Only select 1
calculatedPercent
and 1scoreID
value for eachclassifierID
, subject to the indicated WHERE clauses. If not grouped byclassifierID
there could be anywhere from 0 to 400 rows for eachclassifierID
that satisfy the WHERE clause so I do think GROUP BY is appropriate here.Ensure that the selected
calculatedPercent
for each of the groupedclassifierID
is the highest numerical value among all optionsEnsure that only 4 rows are selected and that they are the rows with for which the selected
calculatedPercent
value is highest.Ensure that the 4 selected rows are ranked in descending order according to the
calculatedPercent
values.Ensure that the
scoreID
value for each of the selected rows is actually representing the same row as the selectedcalculatedPercent
(currently, this is the point at which my query is failing).
Here's a subset of data, as a example:
scoreID | classifierID | bestPercent |
---|---|---|
58007 | 42 | 66.60 |
63882 | 42 | 64.69 |
64685 | 54 | 64.31 |
58533 | 32 | 63.20 |
55867 | 42 | 62.28 |
66649 | 7 | 56.79 |
55392 | 12 | 50.28 |
58226 | 1 | 49.52 |
55349 | 7 | 41.10 |
Here's the desired output when I run the query:
scoreID | classifierID | bestPercent |
---|---|---|
58007 | 42 | 66.60 |
64685 | 54 | 64.31 |
58533 | 32 | 63.20 |
66649 | 7 | 56.79 |
Here's the actual output when I run the query:
scoreID | classifierID | bestPercent |
---|---|---|
55867 | 42 | 66.60 |
64685 | 54 | 64.31 |
58533 | 32 | 63.20 |
55349 | 7 | 56.79 |
As shown, the scoreID
values for the first and fourth rows of actual output are not correct.
At this point, I welcome any advice.
EDIT 2 - Final working solution
WITH cte AS (
SELECT scoreID, classifierID, calculatedPercent AS bestPercent,
ROW_NUMBER() OVER (PARTITION BY classifierID ORDER BY calculatedPercent DESC, scoreID DESC) AS rn
FROM masterScores WHERE memberID = 3516 AND eventDivision = "O" AND scoreUnusable != "TRUE"
)
SELECT scoreID, classifierID, bestPercent
FROM cte
WHERE rn = 1
ORDER BY bestPercent DESC
LIMIT 4
I was able to test this against a half-dozen cases that were giving problems and this solution solved every one of them. Again, thanks @Akina !
Going to mark this solved.