Why does SQL query with GROUP BY produce more rows?

Question

I have the following table:

+------+-------+--------------------------------------+
| id   | rev   | content                              |
+------+-------+--------------------------------------+
| 1    | 1     | ...                                  |
| 2    | 1     | ...                                  |
| 1    | 2     | ...                                  |
| 1    | 3     | ...                                  |
+------+-------+--------------------------------------+

When I run the following query:

SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;

I get:

+------+----------+--------------------------------------+
| id   | maxrev   | content                              |
+------+----------+--------------------------------------+
| 1    |    3     | ...                                  |
| 2    |    1     | ...                                  |
+------+----------+--------------------------------------+

But if I remove the GROUP BY clause as follows:

SELECT id, MAX(rev) maxrev, content
FROM YourTable;

I get:

+------+----------+--------------------------------------+
| id   | maxrev   | content                              |
+------+----------+--------------------------------------+
| 1    |    3     | ...                                  |
+------+----------+--------------------------------------+

This is counter-intuitive to me because of the expectation that a GROUP BY would reduce the number of results by eliminating duplicate values. However, in the above case, introduction of the GROUP BY does the opposite. Is this because of the MAX() function, and if so, how?

PS: The table is based on the SO question here: SQL select only rows with max value on a column. I was trying to understand the answer to that question, and in the process, came across the above situation.

EDIT:

I got the above results on sqlfiddle.com using its MySQL 5.6 engine, with no customization/configuration.

@litelite No, if you don't use `GROUP BY`, it aggregates all the rows in the table without grouping. — Barmar, Aug 08 '17 at 19:51
@litelite Depends on your MySQL settings. Newer versions are more strict by default. — tadman, Aug 08 '17 at 19:51
Because you're using a version of mySQL which extends the group by allowing you to aggregrate on zero or all of the fields in the select not aggregrated.. Current version of mySQL would throw a syntax error provided the extensions have not been enabled. Since you're not grouping by content it picks a single value and since all values are now the same they fall in to the same group. When you eliminate all group by then the engine is fee to pick any id and content but only the max rev. — xQbert, Aug 08 '17 at 19:51
@tadman The new setting does the opposite, it mandates aggregation functions with `GROUP BY`. — Barmar, Aug 08 '17 at 19:51
@tadman Ok, so that was just my MySQL being strict on me. Thanks! — litelite, Aug 08 '17 at 19:52
Read: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html especially: If ONLY_FULL_GROUP_BY is disabled... because prior versions behaved this way. — xQbert, Aug 08 '17 at 19:59
@Barmar That's what I mean. The newer defaults make it complain if you fail to use a `GROUP BY` when using aggregate functions. Older versions didn't care by default, they'd just do *whatever*. — tadman, Aug 08 '17 at 19:59
OK, looks like you're right, you get an error `Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause` — Barmar, Aug 08 '17 at 20:05
The first query gets a different error: `YourTable.content isn't in GROUP BY` — Barmar, Aug 08 '17 at 20:05
Ok I love the attention/comments and feedback on this question. Great turnaround times and awesome information/discussion. It should be noted that this, to my knowledge is somewhat unique to mySQL. Most DB engines don't allow this. Though I thought I encountered something like this in Sybase; but could be wrong.. Sybase groups by all non-aggregated fields if it's left off... Neat. — xQbert, Aug 08 '17 at 20:19

Easton Bornemeier · Accepted Answer · 2017-08-08T20:27:14.813

5

It is utilizing your MAX() function dependent on your GROUP BY clause. So, for your first query, you are saying: Give me the maximum rev for each id, whereas the second is just saying Give me the maximum rev in general.

Thanks to xQbert:

This does NOT mean that you are getting the row with the max rev in the latter case. It will take values from anywhere in the selection to use for your id and content fields.

You can read more about how SQL handles the GROUP BY statement here: Documentation

edited Aug 08 '17 at 20:27

answered Aug 08 '17 at 19:51

Easton Bornemeier

1,918
8
22

and the engine is free to choose any single value from ID and content since they are in the select. (usually the first it encounters) – xQbert Aug 08 '17 at 19:54
@xQbert You are right. I changed my table and input specific values in to the content field instead of "...". The result for the second query (without the GROUP BY) is: `1, 3, 12`, when it should have been `1, 3, 13`. – Sabuncu Aug 08 '17 at 20:01
Thanks @xQbert I've added it to my answer – Easton Bornemeier Aug 08 '17 at 20:03
Thank you @EastonBornemeier, I appreciate your answer. – Sabuncu Aug 08 '17 at 20:04
@Sabuncu it gets a little more messed up/hecktick when joins occur and subqueries because the value may be encountered before an order is executed thus the value is encountered may vary on different executions based on the query plan followed. Best to ensure the ONLY_FULL_GROUP_BY option is enabled imo. – xQbert Aug 08 '17 at 20:05
1

@xQbert It's usually best just to specify a GROUP BY anyways, if possible by a primary key in order to ensure you are getting the whole row :) – Easton Bornemeier Aug 08 '17 at 20:06
2

I like the documentation on this: "In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause" it explains it well: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html – xQbert Aug 08 '17 at 20:09
@xQbert In fact, the values in case of the first query (with the GROUP BY) are also incorrect. The engine, as you said, just pulls random `content` value. Thanks for the link, these answers have been so useful. – Sabuncu Aug 08 '17 at 20:12
2

@easttonbornemeier Booo!!! the opportunity to use mySQL extensions when you fully understand them can result in a performance gain! For example if I know all the values will be the same for the un grouped by columns, then the engine doesn't have the overhead of grouping them and pickign one DOESN'T matter. since the engine doesn't have to group, then overhead is reduced. This is a FEATURE one with purpose; but seldom used correctly; thus the reason the default was changed. (End Commentary on the last line of your updated answer) – xQbert Aug 08 '17 at 20:16
1

The above comment was for informational and comedic purposes only. Any intent to inflame or cause harm to @EastonBornemeier was purely unintentional. ... Cheers! lack of group by... more like `how mySQL extends the standard group by` – xQbert Aug 08 '17 at 20:25
1

NOOOOOOOOooooooooo........ :P glad people have a sense of humor around here... even if it is more warped than mine. – xQbert Aug 08 '17 at 20:27

ScaisEdge · Answer 2 · 2017-08-08T19:54:31.347

0

This because you are using a version previuos that mysql 5.7 ..these version allow the use of aggregated d function and select column not in group by ... this produce impredicatble result for the not aggregated column .. in mysql 5.7 this beahvior is not allowed ... you have an error if you in select not aggregated function not mentioned in group by

the correct sintax is obviuosly the first

SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;

edited Aug 08 '17 at 19:54

answered Aug 08 '17 at 19:51

ScaisEdge

131,976
10
91
107

That's not what the new version requires. It prohibits non-aggregated columns when using `GROUP BY`, it doesn't require `GROUP BY` when using aggregation. Aggregating the entire table is standard SQL. – Barmar Aug 08 '17 at 19:52
@Barmar mixing aggregated and non aggregated column references is not standard SQL – Martin Smith Aug 08 '17 at 19:53
`MAX()` is an aggregation. – Barmar Aug 08 '17 at 19:54
@Barmar yes `MAX` is an aggregate but they also select the non aggregated columns `id` and `content` – Martin Smith Aug 08 '17 at 19:55
I understand that. But I don't think it has anything to do with the MySQL version. The `ONLY_FULL_GROUP_BY` SQL mode only affects what's allowed when you use `GROUP BY`. – Barmar Aug 08 '17 at 19:56
Ok but your original comment referred to standard SQL not whatever that option in MySql does. – Martin Smith Aug 08 '17 at 19:57
@scaisEdge Thank you for pointing out the difference in behavior based on version of MySQL. Upvoted. – Sabuncu Aug 08 '17 at 20:05
All 5.4.5 did I think was change the default of `ONLY_FULL_GROUP_BY` from being `OFF` by default to allow group by extensions to be enabled to being `ON` by default (with regards to this behavior of course) – xQbert Aug 08 '17 at 20:10

score 0 · Answer 3 · answered Aug 08 '17 at 20:00

0

SELECT id, MAX(rev) maxrev, content FROM YourTable
GROUP BY id;

When you run this, as there are 2 distinct ids in the table you get two rows in the result, one per id with the max value. The grouping happens on the id column.

SELECT id, MAX(rev) maxrev, content
FROM YourTable;

If you remove the group by clause, you only get one row in the result corresponding to the max value in the entire table. There is no grouping by id.

answered Aug 08 '17 at 20:00

bicyclecoder

41
6

1

Thanks I appreciate it. I believe your answer is similar to Easton Bornemeier's. – Sabuncu Aug 08 '17 at 20:03

Why does SQL query with GROUP BY produce more rows?

3 Answers3