0

I have a dataset with columns - customer, product, and quantity. I want to find the median of quantity with respect to different products. Assuming we need to work only for an odd number of rows.

Functions like With, Join, aggregate functions like count, avg, max, min, etc are allowed. A solution using nested subqueries would be ideal for this question.

So far I have listed down the quantity and product in sorted order and found the median number using ROUND((COUNT(QUANT) / 2) + 1) and now I need to find that median row without using any window function.

Input

Product Quantity
A 1
A 2
A 3
B 5
B 6
B 7
C 11
C 13
C 15
D 4
D 5
D 6

Output

Product Median
A 2
B 6
C 13
D 5
John
  • 1
  • 1
  • 1
    How do you expect to calculate something without using any functions? –  Dec 02 '21 at 07:32
  • 1
    What do you mean by "now I need to find that median row" ? The median is a calculated value according to your formula, this value may not correspond to any row in your table, so what should be the median row ? – Edouard Dec 02 '21 at 08:44
  • [SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY x) FROM t;](https://stackoverflow.com/a/40317116/5298879) is a simpler way to get median but as @Edouard says, it might not correspond to any row. If you had 4 rows with values 1,2,3,4, their median is 2.5 - there is no row with 2.5 in it. – Zegarek Dec 02 '21 at 11:31
  • @a_horse_with_no_name sorry I should have been clear I cannot use any functions like rownumber or percentile or anything direct. – John Dec 02 '21 at 15:15
  • If you can't use any function you can't even use `count()` or `round()` so this whole question makes no sense to me. –  Dec 02 '21 at 15:15
  • @Edouard with the example I have given in the post I know that I have to select 2nd value for each product but with that information, I cannot go over the table and select the median value – John Dec 02 '21 at 15:17
  • @a_horse_with_no_name no functions with rownumber or percentile. aggregate functions, group by, count, round and joins is fine. – John Dec 02 '21 at 15:18
  • What wrong with using window functions if that solves your problem? –  Dec 02 '21 at 15:19
  • @a_horse_with_no_name Can't use that according to the spec sheet that's the reason I am reaching out here. – John Dec 02 '21 at 15:21
  • @Zegarek Sorry I should have been clear on this. We can make a solution based on odd number of rows. – John Dec 02 '21 at 15:22
  • @John What about `join`, `where`, `limit`, `offset`, [`with` cte](https://www.postgresql.org/docs/14/queries-with.html)? What about subqueries? Are all [window functions](https://www.postgresql.org/docs/14/functions-window.html) forbidden? Please clarify your constraints and assumptions about input data in the question. Some rationale behind those constrains would be helpful as well – Zegarek Dec 02 '21 at 15:51
  • @Zegarek We can use the functions that you have mentioned. As for the window functions mentioned here https://www.postgresql.org/docs/14/functions-window.html I cannot use that directly however implementation of these functions is allowed in some other way like defining your own index is possible. As this is part of an assignment that's the reason for these constraints. Subqueries are fine too. – John Dec 02 '21 at 16:00
  • Usage of window functions can usually be replaced by self join on the table. – Edouard Dec 02 '21 at 16:29
  • @Edouard I did try to do that but I am really stuck on how to approach this. – John Dec 03 '21 at 03:23

1 Answers1

0

You can try these two solutions where prod is the name of your table :

Solution 1 with Window function

SELECT DISTINCT ON (t.Product)
       t.Product
     , nth_value(t.Quantity, m.median) OVER (PARTITION BY t.Product ORDER BY Quantity ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS median_Qty
  FROM prod AS t
 INNER JOIN
     ( SELECT Product, round(count(*)/2+1) :: integer AS median
         FROM prod
        GROUP BY Product
     ) AS m
    ON m.Product = t.Product

Solution 2 without Window function

This solution is based on the SELECT ... LIMIT 1 OFFSET median query where median is a value only known at the run time, so this query must be implemented as a dynamic statement within a plpgsql function whose input parameter is the Product :

CREATE OR REPLACE FUNCTION median_quantity(INOUT myProduct varchar(1), OUT Median_Quantity integer)
RETURNS setof record LANGUAGE plpgsql AS
$$
DECLARE
  median integer ;
BEGIN
    SELECT round(count(*)/2) :: integer
      INTO median
      FROM prod
     WHERE Product = myProduct ;
     
    RETURN QUERY EXECUTE E'
    SELECT Product, Quantity
      FROM prod
     WHERE Product = ' || quote_nullable(myProduct) || '
     ORDER BY Quantity
     LIMIT 1
     OFFSET ' || median ;
END ;
$$ ;

The expected result is given by the query :

SELECT m.myProduct, m.Median_Quantity
  FROM 
     ( SELECT DISTINCT ON (Product)
              Product
         FROM prod
     ) AS p
 CROSS JOIN LATERAL median_quantity(p.Product) AS m

All the details in dbfiddle

Edouard
  • 6,577
  • 1
  • 9
  • 20