5

I have a need to generate a particular report from some data and I am having a great deal of trouble figuring out the proper usage of PERCENTILE_CONT to provide the results I need. I would like to include a column in my query result which shows what value is the 95th percentile from a range of values.

I have a table as follows:

customer_id sale_amount sale_date
1   265.75  2019-09-11 00:00:04.000
1   45.75   2019-09-10 01:00:04.000
1   2124.77 2019-09-10 04:00:04.000
1   66.99   2019-09-10 04:20:04.000
1   266.49  2019-09-09 11:20:04.000
1   3266.49 2019-09-08 11:20:04.000

Pretty straightforward.

I can run the following query, no problem:

select
    min(sale_amount) as minimum_sale,
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale
from
    sales
where
    customer_id = 1;

Which results in the following output:

minimum_sale    maximum_sale    average_sale
45.75           3266.49     1006.040000

What I'm after is a fourth column, perc_95, which would calculate what value represents the 95th percentile in terms of sale_amount.

This works to get me the value:

select distinct
    customer_id,
    percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
    sales;

Output:

customer_id perc_95
1            2981.06

But I can't seem to combine them - this fails:

select distinct
    (customer id),
    min(sale_amount) as minimum_sale,
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale,
    percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
    sales
where
    customer_id = 1;

Output:

Column 'sales.customer_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I generally understand what this error means but I am having trouble figuring out how to deal with it in this context.

My desired output:

customer_id     minimum_sale      maximum_sale  average_sale    perc_95
1                   45.75         3266.49  1006.040000     2981.06
Dai
  • 141,631
  • 28
  • 261
  • 374
user3908134
  • 81
  • 1
  • 7

3 Answers3

3

Use window functions:

select distinct customer_id,
       min(sale_amount) over (partition by customer_id) as minimum_sale, 
       max(sale_amount) over (partition by customer_id) as maximum_sale,
       avg(sale_amount) over (partition by customer_id) as average_sale,
       percentile_cont(0.95) within group (order by sale_amount)  over (partition by customer_id) as perc_95
from sales
where customer_id = 1;

It is highly inconvenient that SQL Server does not support functions such as percentile_cont() as aggregation functions, requiring people to use select distinct for aggregation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Don't use DISTINCT.

I'd try this at first:

select 
    min(customer_id) AS CustomerID, 
    min(sale_amount) as minimum_sale, 
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale, 
    percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from sales 
where customer_id = 1;

If you get the same error message, but this time about percentile_cont, then wrap it in min function as well:

select 
    min(customer_id) AS CustomerID, 
    min(sale_amount) as minimum_sale, 
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale, 
    min(percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id)) as perc_95
from sales 
where customer_id = 1;
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
0

Another approach is to split the query into two separate CTE steps: one using SELECT DISTINCT with window-functions, the other using GROUP BY, with a final query to JOIN the two together, like so:

DECLARE @customerId int = 1;

WITH aggs AS (

    SELECT
        customer_id,
        MIN( sale_amount ) AS minimum_sale,
        AVG( sale_amount ) AS mean_sale,
        MAX( sale_amount ) AS maximum_sale
    FROM
        sales
    WHERE
        customer_id = @customerId
    GROUP BY
        customer_id
),
wnds AS (

    SELECT DISTINCT
        customer_id,
        percentile_cont(0.05) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_05,
        percentile_cont(0.50) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as median,
        percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
    FROM
        sales
    WHERE
        customer_id = @customerId
)
SELECT
    ISNULL( a.customer_id, w.customer_id ) AS customer_id,
    a.minimum_sale,
    p.perc_05,
    p.median,
    a.mean_sale,
    p.perc_95,
    a.maximum_sale

FROM
    aggs AS a
    INNER JOIN wnds AS w ON a.customer_id = w.customer_id
Dai
  • 141,631
  • 28
  • 261
  • 374