Calculating percentiles in SQL

Question

This should be very straightforward, but as a newbie to SQL I am really struggling. I've been recommended to use PERCENTILE_CONT with continuous (non-discrete) data.

The data in question concerns two columns: (1) the IDs for a list of patients and (2) the average number of events per year.

Working from some code I found online, this is where I am up to

SELECT ID,
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_75
FROM AE_COUNT;

This just seems to report thee columns each with identical values of PPPY.

Any idea where I'm going wrong?

`PARTITION BY ID` I guess you want to partition for something else. If ID is unique/primary key there is only one value per group. — Lukasz Szozda, Jul 26 '18 at 15:49
Well spotted! I've added a column equal to one and used that. Thanks you. Is there a better way of calculating a percentile for an entire sample, for my future reference? — user518206, Jul 26 '18 at 15:51
If you want the overall value you can use `OVER()`. This applies to any window function. — Manngo, May 13 '23 at 01:16

score 3 · Answer 1 · answered Jul 26 '18 at 16:16

Assuming that you want to get the percentiles for the whole table, try this:

SELECT Distinct
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_75
FROM AE_COUNT;

Removing the partition statement will run it against the whole table. I also removed the Id column form the select statement and made it distinct.

I would also like to point out that you said the second column is the average number of events per year. I have no idea what you need the percentiles for, but be aware that calculating a percentile of the averages of a group of sets will not yield the same result as calculating the percentiles of a union of the sets.

Looks like I initially missed the Oracle tag on the question. I don't believe Oracle requires the Over clause, so you could drop it entirely. Other SQL implementations may require it though. — Robert Corell, Jul 26 '18 at 16:31

score 1 · Answer 2 · answered Jul 26 '18 at 16:18

PERCENTILE_CONT() is either a window function or an aggregate function. If you want a single row summarized for all the data, use it as an aggregate function:

SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT;

If you wanted the value per patient, you would do:

SELECT id, percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT
GROUP BY id;

However, patients probably have very few rows, so the values are likely to be the same for any given patient.

Calculating percentiles in SQL

2 Answers2