0

I'm practicing SQL on SQLZOO, and I'm working on Joins. Question 11 of that section asks: "For every match involving 'POL', show the matchid, date and the number of goals scored."

So I tried the following code:

SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id 
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid

But it throws an error:

'gisq.game.mdate' isn't in GROUP BY

So the answer is:

SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id 
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid, mdate

My question is, why is it required to also include mdate in the GROUP BY clause if it's not part of the aggregate function? Thank you and sorry for the newbie question. Here is the table's format: https://sqlzoo.net/wiki/The_JOIN_operation

Dale K
  • 25,246
  • 15
  • 42
  • 71
Maxifolo
  • 3
  • 4
  • *"why is it required to also include mdate in the GROUP BY clause if it's not part of the aggregate function?"* Because otherwise how does SQL Server know which value to return? It doesn't, and so it errors. – Thom A Sep 19 '21 at 19:25
  • 1
    Does this answer your question? [Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause](https://stackoverflow.com/questions/13999817/reason-for-column-is-invalid-in-the-select-list-because-it-is-not-contained-in-e) – Charlieface Sep 19 '21 at 20:45
  • @DaleK I am not. Please elaborate. – Maxifolo Sep 19 '21 at 20:53

3 Answers3

2

The simple reason why it is required is because SQL requires that the GROUP BY columns and the SELECT columns need to be compatible. Those are the rules of the language.

Your query slightly simplified is:

SELECT matchid, mdate, COUNT(player)
FROM goal JOIN
     game
     ON matchid = id
WHERE 'POL' IN (team1, team2)
GROUP BY matchid;

The query is saying: Return one row per matchid -- because of the GROUP BY. But then which mdate gets returned? There could be multiple matches.

SQL requires that you be explicit about what you want. You might intend the most recent date, in which case you would use MAX(mdate). Or you might want a separate row for each date, in which case you would include it in the GROUP BY. Or you might intend something else. The query needs to be clear.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for your prompt answer, but I still do not fully comprehend. There is only one mdate per matchid (There can only be one match date for every match). So how come do you have to specify that you want a seperate row for each date? – Maxifolo Sep 19 '21 at 19:34
  • 2
    Because SQL Server doesn't know that, @Maxifolo . – Thom A Sep 19 '21 at 19:43
  • @Maxifolo - Because a SQL Query is only ***syntactically*** correct if it would bake sense ***regardless*** of the data or the constraints on the data. So, because this would ***not*** work for some possible data sources, it's not syntactically correct. *(This allows a query's syntax to be checked without referring to, or locking, any data or data structures.)* – MatBailie Sep 19 '21 at 19:47
  • @MatBailie Makes total sense now. So basically when using an aggregate function, all the other columns you are selecting should be included in the GROUP BY clause. It all makes sense now. – Maxifolo Sep 19 '21 at 19:58
  • Or aggregated themselves, @Maxifolo . You can have multiple aggregates in a single query. – Thom A Sep 19 '21 at 20:02
  • 1
    @MatBailie To be totally accurate, an SQL query planner *can* allow ungrouped columns that are dependent on a grouped column, and some database systems *do* allow other columns from a table when you're grouping by a Primary Key. However, the rule of thumb "either group or aggregate" is sensible to learn. – IMSoP Sep 19 '21 at 20:25
  • @imsop It's still ***syntactically*** non conformant SQL, and DBMS that do allow it (such as MySQL) are deprecating that functionality (MySQL 8 defaults to Not allowing it). Also, the question is paraphrased as "Why does X yield a syntax error?", that some DBMS don't give such an error doesn't help explain why in this instance it does. – MatBailie Sep 19 '21 at 21:10
0

Group By single column: Group By single column means, to place all the rows with same value of only that particular column in one group.

Group By multiple columns: Group by multiple column for example, GROUP BY column1, column2. This means to place all the rows with same values of both the columns column1 and column2 in one group

Since the question asks you to select date as well, you will have to put that in group by clause, lets suppose what if POL had multiple games on the same date. Keeping date in groupby clause can help you with that scenario.

user19930511
  • 299
  • 2
  • 15
  • Yes, but even if they had multiple games on the same date, the match ids would differ, and by grouping them by the id that would solve that issue, or am I missing something? Thank you – Maxifolo Sep 19 '21 at 19:38
0

When using aggregations and aggregating functions (COUNT, MAX, MIN, AVG, etc.) in the SELECT part of a query together with direct (not aggregated) columns, it's mandatory to repeat all not aggregated columns from the SELECT part in the GROUP BY part of the query. As the result, and this is what is required, all columns are aggregated, some of them by aggregating functions in the SELECT part of your query, the rest of them are aggregated in the GROUP BY clause.

mazooma
  • 116
  • 6