21

I'll like to find Nth percentile.

for example: table: htwt; columns: name, gender, height, weight

result:

| gender | 90% height | 90% weight |
| male   |        190 |         90 |
| female |        180 |         80 |
Eric Tan
  • 243
  • 1
  • 2
  • 5

2 Answers2

23

sqlite is not strong in analytical processing but if your data is not very large, you can try to emulate percentile with ORDER BY, LIMIT 1 and a calculated OFFSET. Note that OFFSET is zero-based so you need to adjust it by one.

SELECT
  height AS 'male 90% height'
FROM table
WHERE gender='male'
ORDER BY height ASC
LIMIT 1
OFFSET (SELECT
         COUNT(*)
        FROM table
        WHERE gender='male') * 9 / 10 - 1;
p.matsinopoulos
  • 7,655
  • 6
  • 44
  • 92
laalto
  • 150,114
  • 66
  • 286
  • 303
  • What do you mean by "large" data? What range is the solution here good for in terms of # of rows of data in the table? And can you elaborate on the offset part on how you calculate that? – David Aug 05 '15 at 05:47
  • 1
    I have no idea about the data size range. If it works for you, good. The offset is for selecting the row at 90% of the data ordered by height - there's the other query for retrieving `male_count`. – laalto Aug 05 '15 at 08:01
8

I needed multiple percentages (10, 20 ... 100%) and solved it with:

WITH p AS (SELECT height, NTILE(10) OVER (ORDER BY height) AS percentile
           FROM table
           WHERE gender = 'male')
SELECT percentile, MAX(height) as height
FROM p
GROUP BY percentile;

This solution requires SQLite 3.28.0 or later for the NTILE window function.

sasha
  • 231
  • 3
  • 5
  • Note that it works only if the bins have all the same size or really close to that. For example, I divided my table into 100 bins using `ntile(100)`, ending up with 7 samples in bins 1-55 and 6 samples in bins 56-100. So all my top bins have only 6/7 (86%) the size of the bottom bins, which is quite a big difference. – Martin Jambon Aug 12 '21 at 21:37