Why PostgreSQL is not accepting while group by on one table and selecting towards another tables

Question

I am using postgreSQL version PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-22ubuntu5) 4.7.2, 64-bit,my question is am joining two tables,Let name it as temp1 and temp2 ,here i need to join this two table

Table structure is

marks_map
marks   int
stud_id  int

student
stud_id int
class_id int

here my query

select class_id,stud_id,count(marks) 
from student as s 
inner join marks_map as m on (s.stud_id=m.stud_id) group by stud_id

Here i get error as

ERROR:  column "s.class_id" must appear in the GROUP BY clause or be used in an aggregate function

Why does this error happen? If I use class_id in group by it's running successfully.

That's how Group By works: if you don't Group By class_id how would it know which of multiple class_id values you wanted? — IMSoP, Sep 02 '13 at 08:28
Hi,in my actual query i need to join more then 3 tables,this query is for showing i get a bug like this. — kumar, Sep 02 '13 at 08:31
You probably also want `sum(marks)`, not `count(marks)`, by the way. — IMSoP, Sep 02 '13 at 08:32
This has nothing to do with how many tables are joined. You can only mention in the `select` clause columns that you've grouped by, or aggregate functions like `sum`, `count`, `min`, etc. Mentioning other columns makes no logical sense. — IMSoP, Sep 02 '13 at 08:37
It's ***not*** a bug. This is how `GROUP BY` works in a SQL database. (don't be fooled by the way MySQL handles that - it doesn't complain and simply returns random results in case `GROUP BY` isn't used properly). Maybe you should tell us what output you expect, then we can suggest a solution to the real problem. — , Sep 02 '13 at 08:52

score 1 · Accepted Answer · answered Sep 02 '13 at 08:29

1

You have to add the class_id attribute to your group by clause because in your select part of the statement there is no aggregation function over this attribue.

In GROUP BY statments you have to add all the attributes over which you haven't aggregated after the GROUP BY clause.

For example:

SELECT
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3, sum(attr4)
FROM
table
GROUP BY
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3

answered Sep 02 '13 at 08:29

Mirco Widmer

2,139
1
20
44

@micro.widmer,yes you are correct,why this so? Their is any other way which will lead me to select a column which is not used in group by and aggregate function – kumar Sep 02 '13 at 09:18
As others already commented, that's the way `GROUP BY` works. What exactly doesn't work as you expect/need it? – Mirco Widmer Sep 02 '13 at 09:32
@mirco.widmer Well, if there's more than one `class_id` for a given `student_id`, what do you want it to do? Pick one at random and show it? (If so, you want `DISTINCT ON`; see the PostgreSQL manual). Or list all student ID / class ID combinations - in which case you must `GROUP BY` both student ID and class ID? – Craig Ringer Sep 02 '13 at 11:41
If there is more than one `class_id` for a given `student_id`, then there is another problem because the `marks_map` doesn't include the class_id. Daniel Vérité described the problem pretty good. The problem is in the database design. – Mirco Widmer Sep 02 '13 at 16:23

score 0 · Answer 2 · answered Sep 02 '13 at 09:57

That's the way group by work.

You can check your data like

select
    array_agg(class_id) as arr_class_id,
    stud_id, count(marks) 
from student as s 
   inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id

and see how much class_id you have for each group. Sometimes your class_id is dependant from stud_id (you have only one elemnet in array for each group), so you can use dummy aggregate like:

select
    max(class_id) as class_id,
    stud_id, count(marks) 
from student as s 
   inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id

score 0 · Answer 3 · answered Sep 02 '13 at 11:59

You should be able to understand the problem on a simplified case that doesn't even involve a JOIN.

The query SELECT x,[other columns] GROUP BY x expresses the fact that for every distinct value of x, the [other columns] must be output with only one row for every x.

Now looking at a simplified example where the student table has two entries:

stud_id=1, class_id=1
stud_id=1, class_id=2

And we ask for SELECT stud_id,class_id FROM student GROUP BY class_id.

There is only one distinct value of stud_id, which is 1.

So we're telling the SQL engine, give me one row with stud_id=1 and the value of class_id that comes with it. And the problem is that there is not one, but two such values, 1 and 2. So which one to choose? Instead of choosing randomly, the SQL engine yields an error saying the question is conceptually bogus in the first place, because there's no rule that says each distinct value of stud_id has its own corresponding class_id.

On the other hand, if the non-GROUP'ed output columns are aggregate functions that transform a series of values into just one, like min, max, or count, then they provide the missing rules that say how to get only one value from several. That's why the SQL engine is OK with, for instance: SELECT stud_id,count(class_id) FROM student GROUP BY stud_id;.

Also, when faced with the error column "somecolumn" must appear in the GROUP BY clause, you don't want to just add columns to the GROUP BY until the error goes away, as if it was purely a syntax problem. It's a semantic problem, and each column added to the GROUP BY changes the sense of the question submitted to the SQL engine.

That is, GROUP BY x,y means for each distinct value of the (x,y) couple. It does not mean GROUP BY x, and hey, since it leads to an error, let's throw in the y as well!

Why PostgreSQL is not accepting while group by on one table and selecting towards another tables

3 Answers3