Why do i need to include columns present in SELECT clause again in GROUP BY clause for my result?

Question

I am working on mySQL problem and I want to group the result according to height range of the players.

However, when i write down

SELECT
  id, --displays error, needs to be included in GROUP BY
  player_name,
  CASE 
    WHEN height <= 50 THEN '0-50'
    WHEN height BETWEEN 50 AND 70 THEN '51-70'
    WHEN height BETWEEN 71 AND 80 THEN '71-80'
    ELSE '80+'
  END 
  AS height_range
  
FROM
  benn.college_football_players
GROUP BY 
  height_range 
ORDER BY
  height_range DESC

I get an error that id should be used in group by clause.

Why is this necessary? Why can't i group by the derived column 'height_range'?

I bet that even after you add `id` in `GROUP BY`, you still get the same error but this time on `player_name`. I suggest you clarify a bit more of what result you're expecting. For example, if you want to show how many players in the each of the `height_range`, then you don't need to do `SELECT id, player_name, CASE .... END AS height_range ... GROUP BY height_range;`.. instead, you should do `SELECT CASE .... END AS height_range, COUNT(*) AS total ... GROUP BY height_range;`. — FanoFN, Oct 20 '21 at 06:40

ProDec · Answer 1 · 2021-10-21T01:06:36.597

Execute query SELECT @@sql_mode to verify the ONLY_FULL_GROUP_BY is inside the query result.

From MySQL documentation for ONLY_FULL_GROUP_BY

Reject queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on (uniquely determined by) GROUP BY columns.

It needs to either remove the above server setting to allow the query execution or in a more recommended way, add aggregate functions to columns not in your GROUP BY, e.g. SELECT MAX(id), GROUP_CONCAT(id), etc.

Removing the ONLY_FULL_GROUP_BY setting and running such non-standard SQL is permitted by MySQL but not recommended. Refer to documentation

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses. Disabling ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the GROUP BY are the same for each group.

You can achieve the same effect without disabling ONLY_FULL_GROUP_BY by using ANY_VALUE() to refer to the nonaggregated column.

I don't disagree with your answer but I think you should highlight the importance of keeping the `ONLY_FULL_GROUP_BY` setting on and why it's on by default (since MySQL v5.7). Your answer seem to make _"removing the setting"_ is the first option. I think why is it that we can still disable the setting is to give us time to fix our codes that probably existed since before the setting was implemented as default. Therefore, removing the setting is a temporary workaround instead of a permanent solution. — FanoFN, Oct 21 '21 at 00:32
@FaNo_FN I said "a more recommended way", removing the setting is not first choice. — ProDec, Oct 21 '21 at 00:39
I can see that but it's not really emphasising the importance of keeping the setting on. OP is asking _"Why is this necessary?"_ and your answer is not really addressing the "Why" part. Probably this sentence _"In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want"_ from the [group by handling docs](https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html) can be included. — FanoFN, Oct 21 '21 at 00:49

Bohemian · Answer 2 · 2021-10-20T06:24:49.200

0

Delete the GROUP BY clause.

SELECT
  id, 
  player_name,
  CASE 
    WHEN height <= 50 THEN '0-50'
    WHEN height <= 70 THEN '51-70'
    WHEN height <= 80 THEN '71-80'
    ELSE '80+'
  END AS height_range
FROM benn.college_football_players
ORDER BY height_range DESC

You are not aggregating, which is when values from multiple rows are aggregated into one value, eg SUM().

You are rendering a single height value as a single height_range value (for every row), which cold be described as "expressing height as one of a group of ranges", but is not "grouping" in the SQL sense of the word.

Note simplification of cases - the first case whose condition is true is used, so no need to use BETWEEN.

edited Oct 20 '21 at 06:24

answered Oct 20 '21 at 06:18

Bohemian

412,405
93
575
722

what if i want to display output so that same height_range is displayed together? – Aparna Oct 20 '21 at 06:23
2

@Aparna what exactly do you mean by "displayed together"? How would you "display together" 100 players that were all in the "51-70" category? – Bohemian Oct 20 '21 at 06:26

score -1 · Answer 3 · answered Oct 20 '21 at 06:23

Why is this necessary? Why can't i group by the derived column 'height_range'?

You have a lot of rows with the same height_range.

You group these rows. The query must return one row for them.

Each source row have its own id, these values differs. But one output row may contain only one value. What of them? server doesn't know... and it generates an error.

Either remove id from the output list or use it in aggregate function (for example, SELECT MAX(id) AS id, ...). This instructs the server to return one definite value from all existing which is correct.

PS. After this fix you'll see the same problem with player_name output column.

PPS. Think about do you need in GROUP BY at all...

Why do i need to include columns present in SELECT clause again in GROUP BY clause for my result?

3 Answers3