How to extract the max of the sum of ~120k rows in postgresql?

Question

I have a database for a bike store, with a table I called "SalesOrderHeader" that has, among others, these 2 columns: CustomerID, TotalDue.

It holds information about all sales of the bike store. So if f.e. a customer bought 2 times in total from the store, there will be 2 registries of his CustomerID with the amount of money (TotalDue) he paid each time.
Here is a screenshot of the table in excel, just to get the idea.

How to write a select query that will show the 1 (or more, in case of a tie) customer that has paid the most in total. The result has to bring up 2 columns one with the CustomerID and one with the TotalMoneyHePaid.

http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557 — , Jun 02 '16 at 15:49
You have a mix of quoted and unquoted identifiers in the question. Be aware that [identifiers are case sensitive in Postgres](http://stackoverflow.com/a/20880247/939860). Assuming all unquoted identifiers for my answer. — Erwin Brandstetter, Jun 02 '16 at 15:54

JNevill · Accepted Answer · 2016-06-02T15:38:47.550

You can use a subquery in your HAVING clause to get this:

SELECT customerid,
    sum(totaldue)
FROM SalesOrderHeader
GROUP BY customerid
HAVING sum(totaldue) = (
        SELECT sum(totaldue)
        FROM SalesOrderHeader
        GROUP BY customerID
        ORDER BY sum(totalDue) DESC LIMIT 1
        )

The subquery:

    SELECT sum(totaldue)
    FROM SalesOrderHeader
    GROUP BY customerID
    ORDER BY sum(totalDue) DESC LIMIT 1

Is getting the largest customer's sum(totalDue) by ordering the records by that sum(totaldue) and then keeping only the first record LIMIT 1. We use that in the HAVING clause to compare that to the Sum(totaldue) for each customer. If the customer in the main query has a sum(totaldue) equivalent to the subquery's result, then we keep the record.

postgres=> CREATE TABLE salesorderheader
postgres-> (
postgres(>   customerid integer,
postgres(>   totaldue integer
postgres(> );
CREATE TABLE

postgres=> INSERT INTO salesorderheader VALUES
postgres->   (1, 10),
postgres->   (1, 12),
postgres->   (2, 22),
postgres->   (3, 5),
postgres->   (4, 4);
INSERT 0 5

postgres=> SELECT customerid, sum(totaldue) FROM SalesOrderHeader GROUP BY customerid HAVING sum(totaldue) = (SELECT sum(totaldue) FROM SalesOrderHeader GROUP BY customerID ORDER BY sum(totalDue) desc LIMIT 1);
 customerid | sum
------------+-----
          1 |  22
          2 |  22
(2 rows)

That edge case is from OP: "The question is to write a select query that will show the 1 (or more, in case of a tie) customer that has paid in total, the most." This query will port better to MySQL, but the Window Function version by Erwin will work fine in most every other RDBMS (Oracle, Sql Server, Teradata, etc..) — JNevill, Jun 02 '16 at 15:42
Thanks a lot for the help! I used your solution because it is closer to my knowledge (new to PostgreSQL) — Orestes Polyzos, Jun 02 '16 at 18:58

score 1 · Answer 2 · edited May 23 '17 at 12:24

I suggest the window function rank() over the sum of totaldue in a subquery:

SELECT customerid, total_money_paid
FROM  (
   SELECT customerid, sum(totaldue) AS total_money_paid
        , rank() OVER (ORDER BY sum(totaldue) DESC NULLS LAST) AS rnk
   FROM   salesorderheader
   GROUP  BY 1
   ) sub
WHERE  rnk = 1;

This way, the table is only scanned once. Should be faster. Window functions are evaluated after plain aggregate functions, hence this is possible. Related:

Postgres window function and group by exception

NULLS LAST is only needed if the sum can be NULL (i. e. totaldue can be NULL). Details:

PostgreSQL sort by datetime asc, null first?

I chose this technique because of your requirement:

show the 1 (or more, in case of a tie) customer

If you want exactly 1 (break ties somehow, if there are more), then DISTINCT ON would be preferable:

Select first row in each GROUP BY group?

How to extract the max of the sum of ~120k rows in postgresql?

2 Answers2