3

I'm looking to pull the top 10% of a summed value on a Postgres sever.

So i'm summing a value with sum(transaction.value) and i'd like the top 10% of the value

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    Please show us some sample data and the expected output –  Oct 07 '13 at 07:43
  • I have customers with customer id's and a value they have spent so i'd like the top 10% of the customers who have spent money. Select customer.id, sum(transaction.value) from customer Basically the output would be: Customer_Id Total Value 3218947932 289.00 – Quinn Olive Oct 07 '13 at 08:14
  • When you say 10% you mean the customers that sums up 10% of total sum (e.g. if total is $100 you have the firsts that sums up to $10) or 10% of number of rows (e.g. if you have 100 rows, you want the 10 first rows as result)? – MatheusOl Oct 07 '13 at 13:22
  • The latter of the two. If there are 100 rows each with their own sum value, i'd like the top 10% of those number of rows. – Quinn Olive Oct 08 '13 at 08:05
  • 1
    Essential information should go into the question, not just into comments. Please *edit* your question for better response. (Press "edit" left under the question.) – Erwin Brandstetter Oct 14 '13 at 09:49

1 Answers1

4

From what I gather in your comments, I assume you want to:

  1. Sum transactions per customer to get a total per customer.
  2. List the top 10 % of customers who actually have transactions and spent the most.

WITH cte AS (
   SELECT t.customer_id, sum(t.value) AS sum_value
   FROM   transaction t
   GROUP  BY 1
   )
SELECT *, rank() OVER (ORDER BY sum_value DESC) AS sails_rank
FROM   cte
ORDER  BY sum_value DESC
LIMIT (SELECT count(*)/10 FROM cte)

Major points

  • Best to use a CTE here, makes the count cheaper.

  • The JOIN between customer and transaction automatically excludes customers without transaction. I am assuming relational integrity here (fk constraint on customer_id).

  • Dividing bigint / int effectively truncates the result (round down to the nearest integer). You may be interested in this related question:
    PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

  • I added a sails_rank column which you didn't ask for, but seems to fit your requirement.

  • As you can see, I didn't even include the table customer in the query. Assuming you have a foreign key constraint on customer_id, that would be redundant (and slower). If you want additional columns from customer in the result, join customer to the result of above query:

WITH cte AS (
   SELECT t.customer_id, sum(t.value) AS sum_value
   FROM   transaction t
   GROUP  BY 1
   )
SELECT c.customer_id, c.name, sub.sum_value, sub.sails_rank
FROM  (
   SELECT *, rank() OVER (ORDER BY sum_value DESC) AS sails_rank
   FROM   cte
   ORDER  BY sum_value DESC
   LIMIT (SELECT count(*)/10 FROM cte)
   ) sub
JOIN  customer c USING (customer_id);
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228