4

I'm trying to find the 25th and 75th percentiles in MariaDB 10.4.11, according to https://mariadb.com/kb/en/percentile_cont/ I believe the below code is the correct way to do it, however it returns the same result for each calculation?

select name, 
    percentile_cont(0.25) within group (order by sell_price) over (partition by name) as percentile_25,
    percentile_cont(0.5) within group (order by sell_price) over (partition by name) as median,
    percentile_cont(0.75) within group (order by sell_price) over (partition by name) as percentile_75
from commodity
group by name;

Sample Data;

market_id    name        sell_price 
3223191296   beer       175
128081144    beer       175
3225577472   beer       338
3228907520   beer       409
128666762    beer       600
3223210496   beer       646
3543674368   beer       647
3543674368   beer       647
3227117312   beer       690
3224189696   beer       704
3227711744   beer       709
128754255    beer       756
3223191296   coffee     1286
128081144    coffee     1286
3228907520   coffee     1601
3225577472   coffee     1694
128666762    coffee     1703
128754255    coffee     1842
3223210496   coffee     1892
3227117312   coffee     1928
3227711744   coffee     1956
3224189696   coffee     1965
3543674368   coffee     2245
3223891456   coffee     2733
3223891456   beer       4431

Expected result (made up);

name        percentile_25   median  percentile_75
beer        338             646     704
coffee      1694            1892    2245
5318008
  • 77
  • 5
  • Please provide sample data, current and expected results as tabular text so one can understand the issue and test possible solutions. – GMB Jan 27 '20 at 11:13

2 Answers2

4

percentile_cont() is a window function and not an aggregation function.

A simple solution is to use select distinct rather than group by:

select distinct name, 
       percentile_cont(0.25) within group (order by sell_price) over (partition by name) as percentile_25,
       percentile_cont(0.50) within group (order by sell_price) over (partition by name) as median,
       percentile_cont(0.75) within group (order by sell_price) over (partition by name) as percentile_75
from commodity;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

The PERCENTILE_CONT function is a window function and therefore is applied to the entire result set. You could get the output you want by aggregating by name and taking the max of each expression:

SELECT
    name, 
    MAX(percentile_25) AS percentile_25, 
    MAX(median) AS median, 
    MAX(percentile_75) AS percentile_75
FROM
(
    SELECT
        name,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS percentile_25,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS median,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS percentile_75
    FROM commodity
) t
GROUP BY name;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I get this error; ERROR 4074 (HY000): Window functions can not be used as arguments to group functions. – 5318008 Jan 27 '20 at 11:25
  • Nesting the max appears to work, but seems like a workaround rather than a solution – 5318008 Jan 27 '20 at 11:32
  • Ran on the full dataset in 3 mins 41 secs. – 5318008 Jan 27 '20 at 11:43
  • @5318008 . . . You accepted this answer although the syntax is incorrect and generates an error. At the very least, you should edit the answer and put the working solution in the answer. – Gordon Linoff Jan 27 '20 at 12:34