Aggregate highest prices per client of salesmen

Question

I have a table like this:

SELECT * FROM orders;

 client_id | order_id | salesman_id | price
-----------+----------+-------------+-------
         1 |      167 |           1 |    65
         1 |      367 |           1 |    27
         2 |      401 |           1 |    29
         2 |      490 |           2 |    48
         3 |      199 |           1 |    68
         3 |      336 |           2 |    22
         3 |      443 |           1 |    84
         3 |      460 |           2 |    92

I want to find the an array of order_ids for each of the highest priced sales for each unique salesman and client pair. In this case I want the resulting table:

 salesman_id |   order_id
-------------+----------------
           1 | {167, 401, 443}
           2 | {490, 460}

So far I have an outline for a query:

SELECT salesman_id, max_client_salesman(order_id)
FROM orders
GROUP BY salesman_id;

However I'm having trouble writing the aggregate function max_client_salesman.

The documentation online for aggregate functions and arrays in postgres is very minimal. Any help is appreciated.

How would you deal with two equally highest prices for one `(salesman_id, client_id)`? Pick one randomly? Pick both? Also, as *always*, declare the version of PostgreSQL in use. — Erwin Brandstetter, Feb 06 '13 at 02:37
If there are matching prices then it doesn't really matter which one is chosen. I'm working on postgres 9.0 (unfortunately). Thanks for the answer! — wesdotcool, Feb 06 '13 at 22:05

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Standard SQL

I would combine the window function last_value() or firstvalue() with DISTINCT to the get the orders with the highest price per (salesman_id, client_id) efficiently and then aggregate this into the array you are looking for with the simple aggregate function array_agg().

SELECT salesman_id
      ,array_agg(max_order_id) AS most_expensive_orders_per_client
FROM  (
   SELECT DISTINCT
          salesman_id, client_id
         ,last_value(order_id) OVER (PARTITION BY salesman_id, client_id
                                     ORDER BY price
                                     ROWS BETWEEN UNBOUNDED PRECEDING
                                     AND UNBOUNDED FOLLOWING) AS max_order_id
   FROM   orders
   ) x
GROUP  BY salesman_id
ORDER  BY salesman_id;

Returns:

 salesman_id |  most_expensive_orders_per_client
-------------+------------------------------------
           1 | {167, 401, 443}
           2 | {490, 460}

SQL Fiddle.

If there are multiple highest prices per (salesman_id, client_id), this query pick only one order_id arbitrarily - for lack of definition.

For this solution it is essential to understand that window functions are applied before DISTINCT. How you to combine DISTINCT with a window function:

PostgreSQL: running count of rows for a query 'by minute'

For an explanation on ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING refer to this closely related answer on dba.SE.

Simper with non-standard `DISTINCT ON`

PostgreSQL implements, as extension to the SQL standard, DISTINCT ON. With it you can very effectively select rows unique according to a defined set of columns.
It won't get simpler or faster than this:

SELECT salesman_id
      ,array_agg(order_id) AS most_expensive_orders_per_client
FROM  (
   SELECT DISTINCT ON (1, client_id)
          salesman_id, order_id
   FROM   orders
   ORDER  BY salesman_id, client_id, price DESC
   ) x
GROUP  BY 1
ORDER  BY 1;

SQL Fiddle.

I also use positional parameters for shorter syntax. Details:

Select first row in each GROUP BY group?

The first answer works. I didn't understand window functions until now. Thanks for the answer! — wesdotcool, Feb 06 '13 at 22:04
@wesdotcool: Nice. :) BTW, both answers should work and the 2nd should be faster. I added an SQLfiddle to demonstrate (and to play with). — Erwin Brandstetter, Feb 06 '13 at 22:39

score 0 · Answer 2 · answered Feb 06 '13 at 00:28

I think you want the Postgres function array_agg in combination with row_number() However, your description of the query does not make sense to me.

The following gets clients and salesmen and the list of orders for the highest priced order by salesman:

select client_id, salesman_id, array_agg(order_id)
from (select o.*,
             row_number() over (partition by salesman_id order by price desc) as sseqnum,
             row_number() over (partition by client_id order by price desc) as cseqnum
      from orders o
     ) o
where sseqnum = 1
group by salesman_id, client_id

I don't know what you mean by "highest priced sales for each salesman and client". Perhaps you want:

where sseqnum = 1 or cseqnum = 1

I accidentally included an extra column in the example output. By "highest priced sales for each salesman and client" I mean that I want an order_id to be included in an array if it was the highest price interaction between a client and a salesman. So if Bob sells Jim a $50 and $40 item, I just want the order_id of the $50 item. — wesdotcool, Feb 06 '13 at 01:16

Aggregate highest prices per client of salesmen

2 Answers2

Standard SQL

Simper with non-standard DISTINCT ON

Simper with non-standard `DISTINCT ON`