-1

I'm translating a SAS project to T-SQL and there's a need to calculate the median.

There is ready to use median function in SAS (e.g.: SELECT MEDIAN(col1, col2, col3, col4) FROM myTable), but in SQL Server that doesn't exist.

I've researched and found some good examples to calculate median in T-SQL, but I don't know how to use it in my query as follows:

SELECT 
    'test' AS colTest,
    CASE 
        WHEN c1 < 0 
           THEN (10) 
           ELSE 0 
    END AS myMedian --this 10 value is just for example, it should be get from a median function
FROM 
    #tb_test

I've been trying with the example below, and got this:

CREATE TABLE #tb_test 
(
     id INT, 
     c1 INT, 
     c2 INT, 
     c3 INT, 
     c4 INT
)

INSERT INTO #tb_test VALUES(1, 2, 4, 6, 8)
INSERT INTO #tb_test VALUES(2, -1, 3, 5, 7)

SELECT * FROM #tb_test;

The table is:

|id| c1| c2| c3| c4|
+--+---+---+---+---+
|1 | 2 | 4 | 6 | 8 |
|2 |-1 | 3 | 5 | 7 |

As far as I got to calculate the median for those columns is:

--my getMedian query:
WITH cte AS
(
    SELECT 
        id, Names, Results
    FROM
        (SELECT id, c1, c2, c3, c4
         FROM #tb_test) p
    UNPIVOT
        (Results FOR Names IN (c1, c2, c3, c4)
        ) AS UNPIVOT_tb_test
)
SELECT DISTINCT PERCENTILE_CONT(0.5) 
       WITHIN GROUP (ORDER BY Results) 
       OVER (partition BY id) AS Median
FROM cte

Which results in:

|Median|
+------+
|4     |
|5     |

I've tried to include this getMedian in the final query, but no success. This desired final query would be:

SELECT 
    'test' AS colTest,
    CASE 
        WHEN c1 < 0 
           THEN ([is it possible to use the getMedian query here?]) 
        ELSE 0 
    END AS myMedian
FROM 
    #tb_test

And its respectively desired result:

|colTest | myMedian |
+--------+----------+
|test    | 4        |
|test    | 5        |

Does anyone knows how can I do it?

Thanks in advance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
jMarcel
  • 958
  • 5
  • 24
  • 54
  • Does this answer your question? [Function to Calculate Median in SQL Server](https://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server) See also this excellent article https://sqlperformance.com/2014/02/t-sql-queries/grouped-median – Charlieface Jun 24 '21 at 19:41
  • I've read it when I was researching, but I think that in those answers there is not a specification about how to calculate median for a set of columns, only for the values in one column. – jMarcel Jun 24 '21 at 21:08
  • Just unpivot the columns then calculate median on that. Unclear if you want the median for every row (in which case unpivot and calculate in a subquery or `apply`) or if you want a single median from all columns, all rows, in which case unpivot then calculate over the whole set – Charlieface Jun 24 '21 at 21:58

3 Answers3

0

It seems you are quite close, I simply added a join to get the following, is this what you want?

WITH cte AS
(
    SELECT 
        id, Names, Results
    FROM
        (SELECT id, c1, c2, c3, c4
         FROM tb_test) p
    UNPIVOT
        (Results FOR Names IN (c1, c2, c3, c4)
        ) AS UNPIVOT_tb_test
), medians as (
  SELECT distinct id, PERCENTILE_CONT(0.5) 
         WITHIN GROUP (ORDER BY Results) 
         OVER (partition BY id) AS Median
  FROM cte
)
select t1.*
  , t2.Median -- add here whatever condition you wish, e.g. case c1<0, etc.
from tb_test as t1 inner join medians as t2 on (t1.id=t2.id)
sg1234
  • 600
  • 4
  • 19
  • In fact, I had already done it that way too, but since there is 48 distinct sets of columns I must use to get the median, I was looking for a more simpler-elegant solution. But thanks anyway :) – jMarcel Jun 24 '21 at 21:20
0

The function does exist. It just requires apply:

select t.*, m.median
from #test t cross apply
     (select top (1) percentile_cont(0.5) within group (order by c) over () as median
      from (values (t.c1), (t.c2), (t.c3), (t.c4)) v(c)
     ) m;

This sort of cheats a little. The function is really a window function, not an aggregation function. The select top (1) just returns one row.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    I think you want `percentile_cont(0.5) within group (order by c) over ()` – Charlieface Jun 24 '21 at 19:45
  • @Gordon,I've tested and worked as @Charlieface mentioned.But how can I put it inside the query as I asked?I've tried this: `SELECT CASE WHEN (c1<0) THEN (select m.median from #tb_test t cross apply (select top(1) percentile_cont(0.5) within group (order by c) over () as median from (values (t.c1), (t.c2), (t.c3), (t.c4)) v(c)) m) ELSE 0 END AS myMedian, 'other columns' AS OtherColumns FROM #tb_test` . But returned **Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.**. – jMarcel Jun 24 '21 at 22:00
  • 1
    @jMarcel If you use `CROSS APPLY` you don't need a subquery, just `SELECT CASE WHEN (c1<0) THEN m.median ELSE 0 END, OtherCols ... FROM #t CROSS APPLY (select top (1)...` Or you can remove the apply and run it like this `SELECT CASE WHEN (c1<0) THEN (select top(1) percentile_cont....` – Charlieface Jun 24 '21 at 22:14
  • Thanks @Gordon! And thank you too @Charlieface, I've adjusted as you explained above and it worked as expected. – jMarcel Jun 25 '21 at 16:44
0

According to @AaronBertrand, the fastest way to calculate a median is to use OFFSET/FETCH. What this does is selects the one or two middle vaues by doing calculations on the total number of values, and averages them.

You can either do this in subquery in the SELECT:

SELECT
    t.*,
    Median = CASE WHEN c1 < 0 THEN (
      SELECT AVG(m.n * 1.0) FROM (
        SELECT n
        FROM (VALUES (t.c1),(t.c2),(t.c3),(t.c4)) v(n)
        ORDER BY v.n
        OFFSET (4 - 1) / 2 ROWS    -- 4 is the number of values
        FETCH NEXT 1 + (1 - 4 % 2) ROWS ONLY  -- again 4
      ) m
    )
    ELSE 0 END
FROM #tb_test t

Or you can put it in an APPLY and select it afterwards


SELECT
    t.*,
    Median = CASE WHEN c1 < 0 THEN m.Median ELSE 0 END
FROM #tb_test t
CROSS APPLY (
    SELECT AVG(m.n * 1.0) FROM (
        SELECT Median = n
        FROM (VALUES (t.c1),(t.c2),(t.c3),(t.c4)) v(n)
        ORDER BY v.n
        OFFSET (4 - 1) / 2 ROWS    -- 4 is the number of values
        FETCH NEXT 1 + (1 - 4 % 2) ROWS ONLY  -- again 4
    ) m
) m
Charlieface
  • 52,284
  • 6
  • 19
  • 43