0

I have a Postgresql database where I have these two tables.

shipping_method

id | name     | abbrev
---+----------+-------
1  | Standard | ST
2  | Express  | EX

shipping_details:

id shipping_method_id estimated_time_min estimated_time_max price
2 1 02:00:00 04:00:00 230
3 2 00:03:00 01:00:00 500
4 1 02:00:00 04:00:00 1230
5 1 02:00:00 04:00:00 850
6 2 01:00:00 02:00:00 1785

My goal is to fetch the most expensive shipping details per shipping method (for a specific product [not in OP]).

So far, I wrote this query:

SELECT 
    sm.id, sm.name, MAX(sd.price) AS max_price
FROM 
    shipping_details AS sd 
LEFT JOIN 
    shipping_method AS sm ON sm.id = sd.shipping_method_id
GROUP BY 
    sm.id

which returns:

id | name     | max_price
---+----------+---------
2  | Express  | 1785
1  | Standard | 1230

Through that query, I am not able to get the shipping_details columns without putting them in GROUP BY clause. I mainly need shipping details for each specific shipping method that has the higher price.

How can I achieve that?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lemayzeur
  • 8,297
  • 3
  • 23
  • 50

3 Answers3

2

Use DISTINCT ON:

SELECT DISTINCT ON (sm.id) sm.id, sm.name, sd.price AS max_price
FROM shipping_details AS sd 
LEFT JOIN shipping_method AS sm
    ON sm.id = sd.shipping_method_id
ORDER BY sm.id, sd.price DESC;

The above logic will return the shipping method having the max price.

Lemayzeur
  • 8,297
  • 3
  • 23
  • 50
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • It works ;-). However, can you explain why you have this `(sm.id) sm.id`. Why you have it twice? – Lemayzeur Oct 01 '21 at 15:51
  • @Lemayzeur The `(sm.id)` term tells Postgres that you want to find the max record per group of records belonging to the same shipping method. The `ORDER BY` clause tells Postgres to find the one record per group having the max price. – Tim Biegeleisen Oct 01 '21 at 15:57
  • Perfect. This solution works well and it takes less time to execute – Lemayzeur Oct 01 '21 at 16:00
1

To get additional columns from each row with the highest price in shipping_details, use DISTINCT ON:

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN (
   SELECT DISTINCT ON (shipping_method_id)
          shipping_method_id AS id, price AS max_price
      --  add more columns as you like
   FROM   shipping_details sd
   ORDER  BY sd.shipping_method_id DESC, sd.price DESC, sd.id  -- ①
   ) sd USING (id);

As long as all rows from shipping_details are involved, it's typically fastest to aggregate first, and then join. (Not when the table holds many additional rows that are eliminated by the join.)

If price can be NULL make that ORDER BY ... price DESC NULLS LAST - else NULL sorts on top in descending order. Be sure to match an existing index.

① An index on shipping_details (shipping_method_id, prize) makes it fast if the table shipping_details is big. Or an index on (shipping_method_id DESC, prize DESC). It's important that the sort order of both columns are in sync with the query. Postgres can scan the index forwards or backwards, but for multicolumn indices the sort order of all columns needs to be in sync with the query. See:

One more issue to note: If there can be multiple shipping details with the highest price, you get an arbitrary pick which can change with every execution, typically after writes to involved rows. To get a stable, deterministic result, add more ORDER BY expressions as tiebreaker. Like sd.id I appended above. Then the smallest id is the winner, consistently.
If there are many ties like that, it even pays to append id to the index. Like (shipping_method_id, prize, id DESC) - note the opposite sort order for id!

Related:

There may be (much) faster techniques, depending on undisclosed details. Assuming many rows per shipping method, and an applicable index as discussed, this should be much faster:

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN LATERAL (
   SELECT sd.price AS max_price, id AS shipping_details_id
   FROM   shipping_details sd
   WHERE  sd.shipping_method_id = sm.id
   ORDER  BY sd.price DESC NULLS LAST
   LIMIT  1
   ) sd ON true;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Yes, It works. I appreciate your answer. But [this answer](https://stackoverflow.com/a/69408600/5644965) takes less time to execute. Thank you! – Lemayzeur Oct 01 '21 at 16:11
  • That's unexpected. This one should be a bit faster - unless there are additional rows in `shipping_details` that are not needed for the query (`shipping_method_id` can be NULL or no FK constraint). Maybe `NULLS LAST` makes a difference. Can be removed if `price` is `NOT NULL`. As long as you match an existing index. – Erwin Brandstetter Oct 01 '21 at 16:25
  • I only have added `id as shipping_details_id` where you put the comment. The execution time becomes a bit closer with your recent edit – Lemayzeur Oct 01 '21 at 16:38
  • @Lemayzeur: I think I pinned down the performance issue. Consider the update. – Erwin Brandstetter Oct 01 '21 at 17:04
  • Can you help me with [this question](https://stackoverflow.com/questions/70188086/what-makes-that-postgresql-query-slow)? Thanks – Lemayzeur Dec 01 '21 at 19:04
0

here is one way using window function:

select *
from shipping_method sm 
join (
      select *, row_number() over (partition by shipping_method_id order by price desc) rn  
      from shipping_details sd) t 
on sd.shipping_method_id = t.id
and rn = 1 ;
eshirvana
  • 23,227
  • 3
  • 22
  • 38