Why group by is forced on specific column but not on the others

Question

I have the following query: (sample schema here: https://www.db-fiddle.com/f/5e9gVC6oRidjYwigPWRKm/3)

SELECT 
  t.customer_id
, t.ticket_id
, c.combination_id
, c.possible_prize + c.confirmed_prize + coalesce(cb.bonus_amount,0) AS won_amount
, round(mul(o.value)::numeric,3) AS odds
FROM tickets t
JOIN combinations c ON c.ticket_id = t.ticket_id
LEFT JOIN combination_bonus cb ON cb.combination_id = c.combination_id 
JOIN outcomes o ON o.ticket_id = t.ticket_id AND o.outcome_id = ANY(c.outcomes)

GROUP BY 1,2,3, cb.bonus_amount
ORDER BY 1

Without + coalesce(cb.bonus_amount,0) it runs fine, why only this column need to grouped, not the two others in this equation?

Also if I would put this line into sum() the results would be totally wrong as it is going to be multiplied few times, and I don't get it why and how.

Would much appreciate explanation on both cases.

Please ask 1 (specific researched non-duplicate) question per post. Please put [mre] code in your post, not just at a link. PS When you get a result that you don't expect, after you isolate the first expression that returns what you don't expect, say what you expected & why with justification. If you did that you would have to refer to definitions & could see how your expectations are wrong. (Here, for group by.) Anyway, always read the documentation & DBMS manual for fucntionality you are using. These questions are faqs. — philipxy, Jun 11 '20 at 22:29
A common error when wanting some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, is to wrongly do all joining then all aggregating or to aggregate over previous aggregations. Write separate aggregations over appropriate rows and/or aggregate a case statement picking rows; always join on a key. Sometimes DISTINCT aggregation picks the right values after a non-key join. (Join on non-keys of 2 tables can give multiple rows for each key of a table.) [sum data from multiple tables](https://stackoverflow.com/q/2591390/3404097). — philipxy, Jun 11 '20 at 22:34

score 2 · Accepted Answer · answered Jun 11 '20 at 20:40

Without + coalesce(cb.bonus_amount,0) it runs fine, why only this column need to grouped, not the two others in this equation?

When it comes to columns c.possible_prize and c.confirmed_prize: you don't need these in the group by clause because that clause already contains c.combination_id (which is hidden behind positional parameter 3), which is the primary key of table c.

Postgres is one of the rare databases (it not the only one) that properly implements the concept of functionaly-dependent column: once you put the primary key of a table in a group by clause, you don't need to add other columns of the same table: the primary key uniquely identify a row.

On the other hand, you don't have the primary key of table cb in the group by clause. You would argue that you are bringing table cb with join condition on that column precisely, which somehow guarantees unicity:

LEFT JOIN combination_bonus cb ON cb.combination_id = c.combin

Well, Postgres is probably not that smart. Your query should work just fine if you put it in there, so:

GROUP BY 1,2,3, cb.combination_id

Yeah seems postgres is not that smart ;) after reading your answer before edit, I came up with same idea of putting cb.combination_id into group by and it works. What about 2nd part of the question? Why sum() creates wrong results there? — sh4rkyy, Jun 11 '20 at 20:49
Ok, I solved it. , c.possible_prize + c.confirmed_prize + coalesce(cb.bonus_amount,0) AS won_amount equals to: , (SELECT sum(c.possible_prize + c.confirmed_prize + coalesce(cb.bonus_amount,0) ) FROM combinations c2 LEFT JOIN combination_bonus cb ON cb.combination_id = c.combination_id WHERE c2.combination_id = c.combination_id) AS sum_won_amount — sh4rkyy, Jun 11 '20 at 21:04
But is it really the only solution to repeat all the FROMs, JOINS and clauses from the outter query in the subquery and link it with where clause to avoid multiplying of values in such cases? — sh4rkyy, Jun 11 '20 at 21:07

score 1 · Answer 2 · answered Jun 11 '20 at 20:37

This is a bit long for a comment.

SQL allows -- and Postgres support -- using group by on a unique or primary key and then selecting other columns without using aggregations. This is called functional dependency (the other columns are functionally dependent on the unique/primary key).

If your first query works, then it is using this functionality in Postgres -- based on combinations.combination_id being the primary key (or at least unique). However, combination_bonus has no key in the group by. And even if combination_bonus.combination_id is the primary key, Postgres may not be smart enough to use this information for functional dependence.

So, just include the entire expression coalesce(cb.bonus_amount, 0) in the group by. Or use an aggregation function.

Why group by is forced on specific column but not on the others

2 Answers2