Max Row In Group By

Question

I've tried reading through a number of the solutions I've found, but there seem to be slight differences in the problems where I'm not sure how to adapt them to fit my own.

I have a table of transactions like so:

+------+-------+------------------------------+
| id   | rev_$ | product  | local_currency    |
+------+-------+------------------------------+
| 1    | 15    | shoe     | USD               |
| 2    | 10    | shirt    | USD               |
| 1    | 20    | shoe     | CAD               |
| 2    | 30    | shoe     | GBP               |
| 1    |  8    | shirt    | USD               |
| 2    | 15    | shirt    | USD               |
| 1    | 10    | shoe     | CAD               |
| 2    | 10    | shoe     | USD               |
+------+-------+------------------------------+

I want to aggregate the table so that

I get the total for each ID by product
The local_currency is the currency used for the highest single value transaction (as well as other fields that I haven't included)

So that table should look like this, after aggregating:

+------+-------+------------------------------+
| id   | rev_$ | product  | local_currency    |
+------+-------+------------------------------+
| 1    | 45    | shoe     | CAD               |
| 1    |  8    | shirt    | USD               |
| 2    | 25    | shirt    | USD               |
| 2    | 40    | shoe     | GBP               |
+------+-------+------------------------------+

Similar questions: (1), (2)

are you sure that the expected output is correct? because I see multiple currencies for the same product ID and product. — Abhilash Reddy, Apr 11 '19 at 19:22
Yes, I want the aggregation to choose the currency for the highest value gransaction of that product group with the ID. — riders994, Apr 11 '19 at 19:46

score 4 · Accepted Answer · 2019-04-11T19:53:54.303

4

You can use group by to calculate the total revenue, collect all currencies in an array and pick the one from the highest value:

select id, 
       sum(rev_$), 
       product, 
       (array_agg(local_currency order by rev_$ desc))[1] as local_currency
from orders
group by id, product
order by id, product;

array_agg(local_currency order by rev_$ desc) will create an array for all currencies that are part of the group defined by group by ordered by rev$ descending. So the first element ([1]) is the one corresponding to the "highest single value transaction"

Online example: https://rextester.com/VOK41538

Another option is to write an aggregate function that does this without the array:

create or replace function first_agg (p_one anyelement, p_other anyelement )
  returns anyelement 
  language sql 
  immutable strict
as
$$
  select p_one;
$$;

create aggregate first_element 
(
  sfunc    = first_agg,
  basetype = anyelement,
  stype    = anyelement
);

Then you can use it like this:

select id, 
       sum(rev_$), 
       product, 
       first_element(local_currency order by rev_$ desc) as local_currency
from orders
group by id, product
order by id, product;

Online example: https://rextester.com/YGRR9338

edited Apr 11 '19 at 19:53

answered Apr 11 '19 at 19:38

this is great, didn't know about this array_agg – sloppypasta Apr 11 '19 at 19:46
I really like this solution! Will this be scalable, though? That array agg looks expensive. – riders994 Apr 11 '19 at 19:47
@riders994 How many rows do you expect for each `(id, product)` combination? – Apr 11 '19 at 19:48
It's daily data that I need to aggregate every month, so max of 31 per `(id, product, location)` combination. Rarely does an id have more than 1 location, but some have a lot. Would the inefficiency not be affected by the number of unique `id`s? – riders994 Apr 11 '19 at 19:50
1

@riders994: that means that each array never has more than 31 entries which shouldn't be a problem. I have added an alternative solution – Apr 11 '19 at 19:54

score 1 · Answer 2 · answered Apr 11 '19 at 19:28

could using some subquery

select  m2.id, sum(m2.rev_$), t2.local_currency
my_table m2
from  (
  select distinct local_currency, id  
  from my_table m1 
  inner join  (
    select  id, max(rev_$) max_rev
    from  my_table 
    group by id 
  ) t1  on t1.id = m1.id and t1.max_rev = m1.rev_$ 
) t2 ON m2.id= t2.id

Max Row In Group By

2 Answers2