1

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me

"Grouping data is a simple process. The selected columns (the column list following the SELECT keyword in a query) are the columns that can be referenced in the GROUP BY clause. If a column is not found in the SELECT statement, it cannot be used in the GROUP BY clause. This is logical if you think about it—how can you group data on a report if the data is not displayed? "

How come when I ran this statement in MySQL it works?

select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
jantristanmilan
  • 4,188
  • 14
  • 53
  • 69
  • It's not the MySql-Workbench but MySql in general which supports it. – Tim Schmelter Mar 04 '14 at 16:41
  • @TimSchmelter oh, but why? isnt it "illogical" like the book said. or is there a reasoning behind it? – jantristanmilan Mar 04 '14 at 16:42
  • 1
    In other rdbms like MS SQL-Sever you need to include all columns in the `GROUP BY` or aggregate the column you want to select somehow(f.e. `MAX(SALARY)`). In MySql you don't have to, which means it returns arbitrary/random rows for these columns. Imho this can cause a lot of troubles which you don't even notice immediately. – Tim Schmelter Mar 04 '14 at 16:44
  • 1
    MySQL's extension of the SQL standard allows this. Some think that this is more of a bug than a feature. – Strawberry Mar 04 '14 at 16:45
  • Here are two good writeups regarding this "feature": http://rpbouman.blogspot.de/2007/05/debunking-group-by-myths.html and http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/ –  Mar 04 '14 at 16:52
  • possible duplicate of [Why does MySQL add a feature that conflicts with SQL standards?](http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards) – krokodilko Mar 04 '14 at 16:53
  • Notice that the book's title is "Teach Yourself SQL", not "Teach yourself MySQL". – Marcus Adams Mar 04 '14 at 17:32

2 Answers2

5

You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.

You can make MySQL enforce GROUP BY in a more standard way:

mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;

mysql> select EMP_ID, SALARY
  from EMPLOYEE_PAY_TBL
  group by BONUS;

ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
4

Because the book is wrong.

The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.

This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:

select t.*
from table t
where t.id in (select max(id)
               from table t
               group by thegroup
              );

(Note: There are other ways to write such a query, this is just an example.)

EDIT:

The query that you are suggesting:

select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;

would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.

What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:

select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;

Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The quote from the book indeed is wrong, but your example is different to the example given in the question - which is indeed invalid SQL according to the SQL standard (unless emp_id is unique - e.g. the PK - in EMPLOYEE_PAY_TBL table) –  Mar 04 '14 at 16:54
  • @a_horse_with_no_name . . . It addresses this part of the quote: "If a column is not found in the SELECT statement, it cannot be used in the GROUP BY clause". – Gordon Linoff Mar 04 '14 at 16:56