365

I got an error -

Column 'Employee.EmpID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by loc.LocationID 

This situation fits into the answer given by Bill Karwin.

correction for above, fits into answer by ExactaBox -

select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by loc.LocationID 

ORIGINAL QUESTION -

For the SQL query -

select *
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by (loc.LocationID)

I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.

I think I have a partial explanation for my own question. Tell me if its ok -

To group all employees that work in the same location we have to first mention the LocationID.

Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure. So, this explains the "it is not contained in either an aggregate function" part of the error.

What is the explanation for the GROUP BY clause part of the error ?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
david blaine
  • 5,683
  • 12
  • 46
  • 55
  • 1
    When you use group by you have to select the thing specifically that you are grouping – Alex W Dec 22 '12 at 03:34
  • 2
    You are probably confusing `GROUP BY` with `ORDER BY`. Grouping is used to aggregate columns; ordering is sorting the result. – BellevueBob Dec 22 '12 at 03:55

4 Answers4

708

Suppose I have the following table T:

a   b
--------
1   abc
1   def
1   ghi
2   jkl
2   mno
2   pqr

And I do the following query:

SELECT a, b
FROM T
GROUP BY a

The output should have two rows, one row where a=1 and a second row where a=2.

But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.

This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).

Fixing it might look like this:

SELECT a, MAX(b) AS x
FROM T
GROUP BY a

Now it's clear that you want the following result:

a   x
--------
1   ghi
2   pqr
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 4
    Bill - an aside - you said that its not "clear which value to choose for b in each group". Why doesn't SQL put b1,b2,b3 all together in that column ? – david blaine Dec 22 '12 at 04:51
  • 25
    @davidblaine, MySQL has a function GROUP_CONCAT() for that. http://dev.mysql.com/doc/refman/5.5/en/group-by-functions.html#function_group-concat But in standard SQL, each column should contain only one value. That's fundamental to relational theory too. – Bill Karwin Dec 22 '12 at 16:45
  • 61
    Love it when folks take the time to explain using first principles. Excellent explanation Bill. Thanks. – Klaus Nji Feb 21 '14 at 19:54
  • 2
    I think this is negative point of MSSQL because when we have some LEFT JOINs and group by the PK of the right table then reasonably we must be able to get the columns of the right table easily without ANY AMBIGUOUS situation (MySQL handles this). – AbbasAli Hashemian Aug 09 '16 at 07:53
  • 1
    Also AFAIK MsSQL had some trick like GROUP_CONCAT too... hmm... something like STUFF(( SELECT (' - '+ColName) AS ColName .... FOR XML PATH('') ), 1, 3, ''). – AbbasAli Hashemian Aug 09 '16 at 08:00
  • 2
    In SQL Server 2017 you can use String_Agg to solve the above problem – Postlagerkarte Aug 03 '17 at 09:43
  • 3
    @Postlagerkarte, that's good to hear that they finally made an aggregate function to do this common task. It always seemed like an ugly hack to use FOR XML PATH. – Bill Karwin Aug 03 '17 at 16:42
  • 1
    Every couple of years when I need to know this stuff, I find myself coming back to this exact answer. This explains it perfectly. – David Klempfner Jan 22 '21 at 05:01
97

Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg

SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2

Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..

John Woo
  • 258,903
  • 69
  • 498
  • 492
19

Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.

Alex W
  • 37,233
  • 13
  • 109
  • 109
  • 1
    Good summary. I have posted a complementary answer here - https://stackoverflow.com/a/63545571/2806819 – k_rollo Aug 23 '20 at 09:52
19

"All I want to do is join the tables and then group all the employees in a particular location together."

It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).

In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

ExactaBox
  • 3,235
  • 16
  • 27
  • 3
    Thats not what i meant. order by will put all employees in the same city one after the other. It will do this for all the cities. I needed to know *how many* employees in a city instead of *which employees* in a city. makes sense ? – david blaine Dec 22 '12 at 04:20
  • 7
    then replace the first line of your original example with: SELECT loc.LocationID, COUNT(*) – ExactaBox Dec 22 '12 at 04:22
  • 3
    yeah, thats what i wanted. Thanks. Looks like i did not frame my question properly. But, Bill Karwin's answer made me understand the reason behind the error. So I accepted his. Thanks again. – david blaine Dec 22 '12 at 04:39