6

This should be very straightforward, but as a newbie to SQL I am really struggling. I've been recommended to use PERCENTILE_CONT with continuous (non-discrete) data.

The data in question concerns two columns: (1) the IDs for a list of patients and (2) the average number of events per year.

Working from some code I found online, this is where I am up to

SELECT ID,
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_75
FROM AE_COUNT;

This just seems to report thee columns each with identical values of PPPY.

Any idea where I'm going wrong?

user518206
  • 93
  • 1
  • 1
  • 7
  • 1
    `PARTITION BY ID` I guess you want to partition for something else. If ID is unique/primary key there is only one value per group. – Lukasz Szozda Jul 26 '18 at 15:49
  • Well spotted! I've added a column equal to one and used that. Thanks you. Is there a better way of calculating a percentile for an entire sample, for my future reference? – user518206 Jul 26 '18 at 15:51
  • If you want the overall value you can use `OVER()`. This applies to any window function. – Manngo May 13 '23 at 01:16

2 Answers2

3

Assuming that you want to get the percentiles for the whole table, try this:

SELECT Distinct
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_75
FROM AE_COUNT;

Removing the partition statement will run it against the whole table. I also removed the Id column form the select statement and made it distinct.

I would also like to point out that you said the second column is the average number of events per year. I have no idea what you need the percentiles for, but be aware that calculating a percentile of the averages of a group of sets will not yield the same result as calculating the percentiles of a union of the sets.

  • Looks like I initially missed the Oracle tag on the question. I don't believe Oracle requires the Over clause, so you could drop it entirely. Other SQL implementations may require it though. – Robert Corell Jul 26 '18 at 16:31
1

PERCENTILE_CONT() is either a window function or an aggregate function. If you want a single row summarized for all the data, use it as an aggregate function:

SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT;

If you wanted the value per patient, you would do:

SELECT id, percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT
GROUP BY id;

However, patients probably have very few rows, so the values are likely to be the same for any given patient.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786