49

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.

This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.

SELECT * 
FROM order_details 
GROUP BY order_no

However this one does work

SELECT SUM(order_price)
FROM order_details
GROUP BY order_no

Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?

In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?

Dale K
  • 25,246
  • 15
  • 42
  • 71
Chris
  • 7,996
  • 11
  • 66
  • 98

5 Answers5

53

In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.

You can also do:

SELECT order_no, SUM(order_price)
  FROM order_details
 GROUP BY order_no;

That will work because all the non-aggregate columns are listed in the GROUP BY clause.

You could do something like:

SELECT order_no, order_price, MAX(order_item)
  FROM order_details
 GROUP BY order_no, order_price;

This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • OK, so after GROUP BY is executed, all the individual tuples for a particular order_no are still accessible by the aggregate functions. That is, the SUM function can still access the order_price attribute of each tuple, even though those tuples have been grouped together by order_no? – Chris Jan 06 '11 at 05:29
  • 5
    @Chris: yes, more or less. You can think of the GROUP BY partitioning the rows in the table into groups; each group has the same set of values for the columns listed in the GROUP BY clause. The aggregates then operate on any of the columns specified, with the aggregate calculated over the rows in the group. The result then consists of one row per group, with the GROUP BY column values plus the associated aggregates. Hummm...I hope that's clear... – Jonathan Leffler Jan 06 '11 at 05:32
  • Yep that's completely cleared it up as far as I can tell. Thanks a lot Jonathan and thanks to the other guys who answered! – Chris Jan 06 '11 at 07:01
  • @JonathanLeffler Thanks for the explanation. I grew up on MySQL, and when I began working with Oracle, I was frustrated that Oracle's GROUP BY was stupid for having to include every column. Now I understand that Oracle's not stupid; it's normal. It's MySQL that just tries to be un-standardly fancy. :-) – Wiseguy Nov 17 '11 at 04:49
  • 1
    @JonathanLeffler - not your answer, but your first comment below it is beautifully clear. All by itself it clears up something I never quite "got" about SQL. – teapot7 Sep 27 '13 at 00:11
  • 1
    FWIW, it isn't only MySQL that doesn't require all SELECT columns to be rigorously in the GROUP BY clause. H2, PostgreSQL are two more that don't insist. – Neil Stockton Oct 12 '15 at 07:42
  • @NeilStockton And also SQLite if I recall correctly. – Michael Nov 08 '18 at 23:21
3

The order in which SQL is written is not the same order it is executed.

Normally, you would write SQL like this:

SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY

Under the hood, SQL is executed like this:

FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.

Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution

k_rollo
  • 5,304
  • 16
  • 63
  • 95
2
SELECT * 
FROM order_details 
GROUP BY order_no

In the above query you are selecting all the columns because of that its throwing an error not group by something like.. to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..

 SELECT * 
    FROM order_details 
    GROUP BY order_no,order_details,etc

etc it means all the columns from order_details table.

DoOrDie
  • 315
  • 3
  • 12
0

To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.

TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.

you can also make it as partition by 1

DoOrDie
  • 315
  • 3
  • 12
0

use Common table expression(CTE) to avoid this issue.

multiple CTes also come handy, pasting a case where I have used...maybe helpful

with ranked_cte1 as  
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r  ),

ranked_cte2 as  ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) )  select * from ranked_cte2

 select * from movie where mov_id=902
Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
tomdxb0004
  • 61
  • 1
  • 6