How to get columns which are not in GROUP BY?

Question

I have a Postgresql database where I have these two tables.

shipping_method

id | name     | abbrev
---+----------+-------
1  | Standard | ST
2  | Express  | EX

shipping_details:

id	shipping_method_id	estimated_time_min	estimated_time_max	price
2	1	02:00:00	04:00:00	230
3	2	00:03:00	01:00:00	500
4	1	02:00:00	04:00:00	1230
5	1	02:00:00	04:00:00	850
6	2	01:00:00	02:00:00	1785

My goal is to fetch the most expensive shipping details per shipping method (for a specific product [not in OP]).

So far, I wrote this query:

SELECT 
    sm.id, sm.name, MAX(sd.price) AS max_price
FROM 
    shipping_details AS sd 
LEFT JOIN 
    shipping_method AS sm ON sm.id = sd.shipping_method_id
GROUP BY 
    sm.id

which returns:

id | name     | max_price
---+----------+---------
2  | Express  | 1785
1  | Standard | 1230

Through that query, I am not able to get the shipping_details columns without putting them in GROUP BY clause. I mainly need shipping details for each specific shipping method that has the higher price.

How can I achieve that?

score 2 · Accepted Answer · edited Oct 01 '21 at 15:47

2

Use DISTINCT ON:

SELECT DISTINCT ON (sm.id) sm.id, sm.name, sd.price AS max_price
FROM shipping_details AS sd 
LEFT JOIN shipping_method AS sm
    ON sm.id = sd.shipping_method_id
ORDER BY sm.id, sd.price DESC;

The above logic will return the shipping method having the max price.

edited Oct 01 '21 at 15:47

Lemayzeur

8,297
3
23
50

answered Oct 01 '21 at 15:42

Tim Biegeleisen

502,043
27
286
360

It works ;-). However, can you explain why you have this `(sm.id) sm.id`. Why you have it twice? – Lemayzeur Oct 01 '21 at 15:51
@Lemayzeur The `(sm.id)` term tells Postgres that you want to find the max record per group of records belonging to the same shipping method. The `ORDER BY` clause tells Postgres to find the one record per group having the max price. – Tim Biegeleisen Oct 01 '21 at 15:57
Perfect. This solution works well and it takes less time to execute – Lemayzeur Oct 01 '21 at 16:00

Erwin Brandstetter · Answer 2 · 2021-12-01T21:01:04.463

To get additional columns from each row with the highest price in shipping_details, use DISTINCT ON:

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN (
   SELECT DISTINCT ON (shipping_method_id)
          shipping_method_id AS id, price AS max_price
      --  add more columns as you like
   FROM   shipping_details sd
   ORDER  BY sd.shipping_method_id DESC, sd.price DESC, sd.id  -- ①
   ) sd USING (id);

As long as all rows from shipping_details are involved, it's typically fastest to aggregate first, and then join. (Not when the table holds many additional rows that are eliminated by the join.)

If price can be NULL make that ORDER BY ... price DESC NULLS LAST - else NULL sorts on top in descending order. Be sure to match an existing index.

① An index on shipping_details (shipping_method_id, prize) makes it fast if the table shipping_details is big. Or an index on (shipping_method_id DESC, prize DESC). It's important that the sort order of both columns are in sync with the query. Postgres can scan the index forwards or backwards, but for multicolumn indices the sort order of all columns needs to be in sync with the query. See:

Optimizing queries on a range of timestamps (two columns)

One more issue to note: If there can be multiple shipping details with the highest price, you get an arbitrary pick which can change with every execution, typically after writes to involved rows. To get a stable, deterministic result, add more ORDER BY expressions as tiebreaker. Like sd.id I appended above. Then the smallest id is the winner, consistently.
If there are many ties like that, it even pays to append id to the index. Like (shipping_method_id, prize, id DESC) - note the opposite sort order for id!

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN LATERAL (
   SELECT sd.price AS max_price, id AS shipping_details_id
   FROM   shipping_details sd
   WHERE  sd.shipping_method_id = sm.id
   ORDER  BY sd.price DESC NULLS LAST
   LIMIT  1
   ) sd ON true;

What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group? (note performance of query nr. 5!)

Yes, It works. I appreciate your answer. But [this answer](https://stackoverflow.com/a/69408600/5644965) takes less time to execute. Thank you! — Lemayzeur, Oct 01 '21 at 16:11
That's unexpected. This one should be a bit faster - unless there are additional rows in `shipping_details` that are not needed for the query (`shipping_method_id` can be NULL or no FK constraint). Maybe `NULLS LAST` makes a difference. Can be removed if `price` is `NOT NULL`. As long as you match an existing index. — Erwin Brandstetter, Oct 01 '21 at 16:25
I only have added `id as shipping_details_id` where you put the comment. The execution time becomes a bit closer with your recent edit — Lemayzeur, Oct 01 '21 at 16:38
@Lemayzeur: I think I pinned down the performance issue. Consider the update. — Erwin Brandstetter, Oct 01 '21 at 17:04
Can you help me with [this question](https://stackoverflow.com/questions/70188086/what-makes-that-postgresql-query-slow)? Thanks — Lemayzeur, Dec 01 '21 at 19:04

eshirvana · Answer 3 · 2021-10-01T16:51:55.153

0

here is one way using window function:

select *
from shipping_method sm 
join (
      select *, row_number() over (partition by shipping_method_id order by price desc) rn  
      from shipping_details sd) t 
on sd.shipping_method_id = t.id
and rn = 1 ;

edited Oct 01 '21 at 16:51

answered Oct 01 '21 at 15:46

eshirvana

23,227
3
22
38

This is the same result as what I have done in OP. How do I get the `shipping_details` related to it? – Lemayzeur Oct 01 '21 at 16:12
@Lemayzeur , oh I missed the point . see my updated answer – eshirvana Oct 01 '21 at 16:52

How to get columns which are not in GROUP BY?

3 Answers3