0

I am trying to computer the median number of transactions in each category. A few notes (as the dataset below is a small snippet of a much larger dataset):

  • An employee can belong to multiple categories
  • Each transaction's median should be > 0
  • Not every person appears in every category

The data is set up like this:

|  Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA |   Sales  |      27     |
| PersonB |   Sales  |      75     |
| PersonC |   Sales  |      87     |
| PersonD |   Sales  |      36     |
| PersonE |   Sales  |      70     |
| PersonB |   Buys   |      60     |
| PersonC |   Buys   |      92     |
| PersonD |   Buys   |      39     |
| PersonA |    HR    |      59     |
| PersonB |    HR    |      53     |
| PersonC |    HR    |      98     |
| PersonD |    HR    |      54     |
| PersonE |    HR    |      70     |
| PersonA |   Other  |      46     |
| PersonC |   Other  |      66     |
| PersonD |   Other  |      76     |
| PersonB |   Other  |      2      |

An ideal output would look like:

| Category | Median | Average |
|:--------:|:------:|:-------:|
|   Sales  |   70   |    59   |
|   Buys   |   60   |    64   |
|    HR    |   59   |    67   |
|   Other  |   56   |    48   |

I can get the average by:

SELECT
    Category,
    AVG(Transaction) AS Average_Transactions
FROM
    table
GROUP BY
    Category

And that works great!

This post tried to help me find the median. What I wrote was:

SELECT
    Category,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
    table
GROUP BY
    Category

But I get an error:

Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause

How can I fix this?

artemis
  • 6,857
  • 11
  • 46
  • 99

3 Answers3

2

You can do what you want using SELECT DISTINCT:

SELECT DISTINCT Category,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;

Unfortunately, SQL Server doesn't offer the PERCENTILE_ functions as window functions and doesn't have a MEDIAN() aggregation function. You can also do this using subqueries and counts.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

It's not optimal but this is your solution

SELECT DISTINCT
       category,
       PERCENTILE_DISC(0.5)WITHIN GROUP(ORDER BY val) OVER (PARTITION BY category) AS Median_Transactions,
       AVG(val) OVER (PARTITION BY d.category)  [AVG]
FROM #data d;
Ilyes
  • 14,640
  • 4
  • 29
  • 55
eshirvana
  • 23,227
  • 3
  • 22
  • 38
0

I don't think this is pretty but it works. I didn't spend time on polishing it

with 
avg_t as 
( select category, avg(sales) as avg_sales 
  from sample 
   group by 1), 
mn as 
(  select category, avg(sales) as median_sales 
   from (
     select category, sales , 
            row_number() over  (partition by category order by sales asc) as r , 
            count(person) over (partition by category) as total_count 
     from  sample 
         ) mn_sub 
    where (total_count % 2 = 0 and r in ( (total_count/2), ((total_count/2)+1)) ) or 
           (total_count % 2 <> 0 and r = ((total_count+1)/2))
         
    group by 1 
)
select avg_t.category, avg_t.avg_sales, mn.median_sales  
 from avg_t 
 inner join mn 
 on avg_t.category=mn.category