17

I was wondering if there are any functions that can be used in MySQL to select the TOP X(or bottom) percent from a column containing numeric values.

Basically, I have a column containing a list of prices and I only want to return those fields in the top ten percentile of prices. Any suggestions?

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
Spencer
  • 21,348
  • 34
  • 85
  • 121
  • If only MySQL supported analytics -- NTILE specifically for this case, but ROW_NUMBER, DENSE_RANK, LEAD and LAG... Your query has to calculate the percentile in a derived table/inline view in order to filter/order by it. – OMG Ponies Jan 19 '11 at 21:59
  • https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_ntile mysql 8 seems to support ntiles and other window functions now – domgom Jun 06 '23 at 10:55

3 Answers3

9

just as an FYI (i know this question is a few years old), this can be done other, cleaner ways as well.

SELECT * FROM product_table WHERE price >= (SELECT price FROM product_table 
ORDER BY price DESC LIMIT 1 OFFSET (SELECT 0.1 * COUNT(*) FROM product_table));

i ran a similar query over a very large database, and it ran very quickly.

jasonmclose
  • 1,667
  • 4
  • 22
  • 38
  • Thanks, this just came in handy. What's the purpose of 'LIMIT 1'? Does that not limit the results to only 1 starting at the value of OFFSET? – Leonidas Apr 25 '13 at 03:04
  • the limit 1 will give you the exact value at 90%. – jasonmclose Apr 25 '13 at 13:24
  • Is there a MySQL version of this answer? I can't seem to get offset to take the value from the select statement, (SELECT 0.1 * COUNT(*) FROM product_table) – sk8asd123 Aug 20 '13 at 20:04
  • i don't have a MySQL install in front of me, but can you just subquery the count? basically, select 0.1 * a.count from (select count(*) from product_table) a; – jasonmclose Aug 21 '13 at 20:45
  • This doesn't work. The OFFSET part causes an error to be thrown :/ – TheRealChx101 Jun 02 '22 at 12:45
9

EDIT - new answer

Answered in Convert SQL Server query to MySQL

Select *
from
(
    SELECT tbl.*, @counter := @counter +1 counter
    FROM (select @counter:=0) initvar, tbl
    ORDER BY ordcolumn
) X
where counter <= (50/100 * @counter);
ORDER BY ordcolumn


OLD ANSWER

For MySQL, you could calculate the batch size required and then LIMIT to that number of records

SELECT @rows := ROUND(COUNT(*) * 10/100) FROM table;
PREPARE STMT FROM ‘SELECT * FROM tbl ORDER BY price LIMIT ?’;
EXECUTE STMT USING @rows;

For a bottom percent, just order in reverse

SELECT @rows := ROUND(COUNT(*) * 10/100) FROM table;
PREPARE STMT FROM ‘SELECT * FROM tbl ORDER BY price DESC LIMIT ?’;
EXECUTE STMT USING @rows;

Oops, maybe the DESC belongs in the first query, but you get the meaning.

Note For SQL Server, the TOP N PERCENT clause certainly helps

select top 10 PERCENT *
FROM TBL
ORDER BY price
Community
  • 1
  • 1
RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • Neat trick, but isn't this is calculating the top 10% of rows by price, not all the rows in the top 10 percentile of price (which is what the question seems to be asking for per accepted answer; although it is a bit ambiguous) – tardate Oct 06 '11 at 12:47
  • Very nice indeed. @tardate for getting the top just ORDER BY DESC and change 50/100 for 10/100.. – TMichel May 18 '16 at 14:03
  • Go back to the old answer! Assigning @variables inside a `SELECT` is deprecated. You have found one of the reasons for such. – Rick James Jun 15 '21 at 03:56
7

UPDATE: Much more thought-out explanation of the subject from much more knowing person here. Nonetheless, it still seems there's no embedded function in MySQL to calculate percentiles.

Try:

SELECT * FROM prices WHERE price >= (SELECT 0.9 * max(price) FROM prices)

SELECT price FROM prices p1 WHERE
(SELECT count(*) FROM prices p2 WHERE p2.price >= p1.price) <=
     (SELECT 0.1 * count(*) FROM prices)
);

This will give price P1 for which number of records in Price table having price >= P1 will be one tenth of total number of records in Price table. After that:

SELECT * FROM prices WHERE price >= (SELECT price FROM prices p1 WHERE
(SELECT count(*) FROM prices p2 WHERE p2.price >= p1.price) <=
     (SELECT 0.1 * count(*) FROM prices)
);

will return all desired records.

Note: I didn't examine performance of this query, I think solution with temporary table/variable must be more effective.

rk567
  • 289
  • 1
  • 4
  • 16
Victor Sorokin
  • 11,878
  • 2
  • 35
  • 51