0

I want to 'group by' beers so that they are grouped together, with the respect aggregate rating and tasters (people who review the beers) listed in separate columns.

Here is my code:

create or replace view tasters_avg_ratings1
as
select a.taster as taster, a.beer as beer, round(avg(a.rating),1) as rating 
from allratings a
group by beer, taster
;

Yet my output looks like this:

beers=# select * from tasters_avg_ratings1;
 taster |          beer          | rating 
--------+------------------------+--------
 Peter  | XXXX                   |    5.0
 Sarah  | James Squire Pilsener  |    3.0
 Raghu  | Sierra Nevada Pale Ale |    3.0
 Hector | Fosters                |    3.0
 John   | Chimay Red             |    3.0
 John   | Sierra Nevada Pale Ale |    5.0
 Geoff  | James Squire Pilsener  |    4.0
 Ramez  | Sierra Nevada Pale Ale |    4.0
 John   | 80/-                   |    4.0
 John   | Rasputin               |    4.0
 Adam   | Old                    |    4.0
 John   | Crown Lager            |    2.0
 Jeff   | Sierra Nevada Pale Ale |    4.0
 Sarah  | Burragorang Bock       |    4.0
 Sarah  | Scharer's Lager        |    3.0
 Sarah  | New                    |    2.0
 Geoff  | Redback                |    4.0
 Adam   | Victoria Bitter        |    1.0
 Sarah  | Victoria Bitter        |    1.0
 Raghu  | Rasputin               |    3.0
 Ramez  | Bigfoot Barley Wine    |    3.0
 Hector | Sierra Nevada Pale Ale |    4.0
 Sarah  | Old                    |    3.0
 Jeff   | Burragorang Bock       |    3.0
 John   | Empire                 |    3.0
 Sarah  | James Squire Amber Ale |    3.0
 Rose   | Redback                |    5.0
 Geoff  | Empire                 |    3.0
 Adam   | New                    |    1.0
 Jeff   | Rasputin               |    1.0
 Raghu  | Old Tire               |    5.0
 John   | Victoria Bitter        |    1.0
(32 rows)

As you can see, the beers are NOT grouped together. Ideally for example, the 'Victoria Bitter' beers should be displayed as a group, not separated.

The desired result is achieved using 'order by'. For example:

create or replace view tasters_avg_ratings1
as
select a.taster as taster, a.beer as beer, round(avg(a.rating),1) as rating 
from allratings a
group by beer, taster
order by a.beer
;

OUTPUT:

beers=# select * from tasters_avg_ratings1;
 taster |          beer          | rating 
--------+------------------------+--------
 John   | 80/-                   |    4.0
 Ramez  | Bigfoot Barley Wine    |    3.0
 Jeff   | Burragorang Bock       |    3.0
 Sarah  | Burragorang Bock       |    4.0
 John   | Chimay Red             |    3.0
 John   | Crown Lager            |    2.0
 Geoff  | Empire                 |    3.0
 John   | Empire                 |    3.0
 Hector | Fosters                |    3.0
 Sarah  | James Squire Amber Ale |    3.0
 Geoff  | James Squire Pilsener  |    4.0
 Sarah  | James Squire Pilsener  |    3.0
 Adam   | New                    |    1.0
 Sarah  | New                    |    2.0
 Adam   | Old                    |    4.0
 Sarah  | Old                    |    3.0
 Raghu  | Old Tire               |    5.0
 Jeff   | Rasputin               |    1.0
 John   | Rasputin               |    4.0
 Raghu  | Rasputin               |    3.0
 Geoff  | Redback                |    4.0
 Rose   | Redback                |    5.0
 Sarah  | Scharer's Lager        |    3.0
 Hector | Sierra Nevada Pale Ale |    4.0
 Jeff   | Sierra Nevada Pale Ale |    4.0
 John   | Sierra Nevada Pale Ale |    5.0
 Raghu  | Sierra Nevada Pale Ale |    3.0
 Ramez  | Sierra Nevada Pale Ale |    4.0
 Adam   | Victoria Bitter        |    1.0
 John   | Victoria Bitter        |    1.0
 Sarah  | Victoria Bitter        |    1.0
 Peter  | XXXX                   |    5.0
(32 rows)

So while I know order by achieves my result, why doesn't 'group by' do the same thing? It is frustrating because I have seen numerous examples on the internet using 'group by' and succeeding especially in cases similar to mine with non-aggregate and aggregate columns. For example: https://learnsql.com/blog/error-with-group-by/ , at tip #3.

Any help would be appreciated, thanks!

  • 1
    Generally, an ORDER BY clause is required when a specific sorting should be applied. Using GROUP BY sets GROUPING (therefore it's named GROUP BY, not ORDER BY), so ORDER BY is still necessary. – Jonas Metzler Nov 22 '22 at 13:48
  • Hey Jonas - could you please elaborate on what you mean by 'group by' sets 'grouping'? Does this mean it will not necessarily place all beers sharing the same name into a consecutive block in the table? What about the example I linked on the website. Thanks – Rayyan Khan Nov 22 '22 at 13:50
  • GROUP BY allows to sum/count/etc. values (you call this aggregation) GROUPED BY specific columns. In your example, the average rating will be built per taster and beer. Without GROUP BY, the average rating would not differ, but always be calculated based on the entire table. GROUP BY has nothing to do with sorting. Sorting requires an ORDER BY clause. There is nothing further to say about that. This question has been answered years ago: https://stackoverflow.com/questions/28149876/does-group-by-automatically-guarantee-order-by – Jonas Metzler Nov 22 '22 at 13:57
  • You need to specify the logic to assemble those subgroups (beer, no-beer, etc.). You need to either create a new column for it, or to define a logic to compute them on the fly. – The Impaler Nov 22 '22 at 13:58
  • You already know how to achieve ordering. So do it. Why volunteer to be frustrated by nonsense? – jjanes Nov 22 '22 at 16:03
  • @jjanes Hi, I don't believe that is a very useful inquiry and sounds almost condescending. As stated in my post, I know order-by achieves my result but I'm clarifying why group-by does not despite an internet source showing it operating in the same way. Others have clarified why this is the case in a helpful and benevolent manner so I have the answer I need. If you still struggle to understand my question then I am more than happy to elaborate! – Rayyan Khan Nov 22 '22 at 22:10

3 Answers3

2

GROUP BY is only being used to compute the aggregate value (average rating in this case). It doesn't have anything to do with the ordering of the results when they are displayed. As you have mentioned, you need to use ORDER BY to get the desired ordering.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
  • Makes sense. So it seems group by is just for aggregating rather than presenting the info? From the replies here, presenting the information in an ordered manner seems to be a nice side effect / bonus of the feature but not its primary one. – Rayyan Khan Nov 22 '22 at 21:34
1

Group By should be used to group rows that have the same value for the specified column, Full explanation here https://stackoverflow.com/a/2421441.

In your case, you want to group beers with the respect to aggregate rating and tasters, so you need to GROUP BY a.beers and order by the first and the third column. Thus the view should be like this:

create or replace view tasters_avg_ratings1
as
select a.taster as taster, a.beer as beer, round(avg(a.rating),1) as rating 
from allratings a
group by a.beer
order by 1,3;

1 and 3 are the ordinal positions of columns that appear in the select list.

  • This makes sense, thanks. But also when I try your view I get this error: psql:prac6_me.sql:111: ERROR: column "a.taster" must appear in the GROUP BY clause or be used in an aggregate function LINE 3: select a.taster as taster, a.beer as beer, round(avg(a.ratin... I'm on psql if that helps. – Rayyan Khan Nov 22 '22 at 21:33
  • This is a common aggregation problem, the selected fields must appear in the GROUP BY clause you can fix this by adding a.taster to your group by clause or you must calculate the aggregate in a sub-query and then join it with itself to get the additional column. See https://stackoverflow.com/a/19602031 for a more detailed example. – Ali Ibrahim Nov 23 '22 at 09:22
1

The SQL standard defines that a SELECT statement without an ORDER BY may return the resulting rows in an arbitrary order. There is a good explanation of why it might look different at first.

GROUP BY is for grouping and aggregating related tuples together. A trivial implementation for grouping is of course sorting the data first and then working your way from top to bottom and aggregate related tuples. If your database chooses to use such an implementation chances are high you will receive an ordered result set. But there are other implementations possible and your database may change to using one of those at any time. And if it does you will receive the same rows but in a different order.

So in short, if you want your result set to be ordered use an ORDER BY.

Regarding the example you mentioned it is perhaps a bit misleading. The result is ordered by the columns it is also grouped by. This is a possible ordering and given the thoughts before probably quite a likely one. But since the query is without an ORDER BY this is just coincidence and not guaranteed.

SebDieBln
  • 3,303
  • 1
  • 7
  • 21
  • 1
    Thanks, appreciate the reply. So it seems grouping does not guarantee ordering and the two are different things. Grouping is used primarily for creating aggregate columns for related tuples and it's just a pleasant coincidence that it sometimes presents the data ordered – Rayyan Khan Nov 22 '22 at 21:36