1

I have this query that I need to graph.
The graph will show Total patients, patient Longest wait (minutes) and patient Median wait for each location. I need to include Medians in my query, and that is where I am stuck.

Example Raw Data:

enter image description here

Query I have so far w/o Medians:

    SELECT 
[Location],
      count([Patient_Number]) Total,
      MAX([WaitTime_Min]) LongWait
     
  FROM MyTable
  where
  [Location] in ('AMB', 'PEDS', 'WALK')
  and [EDNurse] is NULL
  group by [Location]

Output:

enter image description here

I need help getting a last column of Medians for WaitTime (from the raw data) for each location. Any help would be appreciated; Thanks!!

Desired Output:

enter image description here

Monks Den
  • 55
  • 7
  • 2
    Does this answer your question? [Function to Calculate Median in SQL Server](https://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server) – Thom A Jan 06 '22 at 17:14
  • can you group by location and use last? – Golden Lion Jan 06 '22 at 18:14

1 Answers1

1

There are PERCENTILE_CONT(for continues values), PERCENTILE_DISC(for discrete value) methods in Sqlserver 2012 for doing that. you can read more about it here.

SELECT 
[Location],
      count([Patient_Number]) Total,
      MAX([WaitTime_Min]) LongWait,
     max(MedianDisc) MedianDisc, -- Discrete median
     max(MedianCont) MedianCont -- continues median
  FROM (
          select  m.*, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY WaitTime_Min)   
                            OVER (PARTITION BY [Location]) AS MedianCont  
      ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY WaitTime_Min)   
                            OVER (PARTITION BY [Location]) AS MedianDisc
            from myTable m
             where
  [Location] in ('AMB', 'PEDS', 'WALK')
  and [EDNurse] is NULL
   
  )tt
 
  group by [Location]

Update: based on performance issues in this method that Aaron commented, I add a pure SQL calculation for Median:

select  [Location], 
        count([Patient_Number]) Total,
        MAX([WaitTime_Min]) LongWait,
        AVG(1.0 * LongWait) As Median
FROM
(
    SELECT [Location],
       [Patient_Number]  Total,
       [WaitTime_Min]  LongWait
        , ra=row_number() over (partition by [Location] order by [WaitTime_Min])
        , rd=row_number() over (partition by [Location] order by [WaitTime_Min] desc)
    from myTable m
    where [Location] in ('AMB', 'PEDS', 'WALK')
      and [EDNurse] is NULL
) as x
where ra between rd-1 and rd + 1
group by [Location]
Farshid Shekari
  • 2,391
  • 4
  • 27
  • 47
  • Just keep in mind these functions are not fantastic. See [this article](https://sqlperformance.com/2014/02/t-sql-queries/grouped-median), for example, and some other ideas in the [Grouped Median section here](https://sqlperformance.com/2015/08/sql-plan/improving-the-top-top-descending-median-solution). – Aaron Bertrand Jan 06 '22 at 17:50
  • Just curious Farshid, why are you uinsg MAX for medianDisc and MedianCOnt? – Monks Den Jan 06 '22 at 19:20
  • @MonksDen You got the below error if don't want to use the aggregate function for it: Column 'tt.medianDisc' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. – Farshid Shekari Jan 06 '22 at 19:27
  • Ok, I just want to make sure that I am getting the actual Median value. Thanks! – Monks Den Jan 06 '22 at 19:29
  • It's not working. It is pulling the MAX wait time number from the waittime column. For example, if you have: 1,2,5,10,18 it's pulling 18 as the median. I need the median not the max? Any ideas? – Monks Den Jan 06 '22 at 20:16
  • @MonksDen I don't think so, I just add the median calculation section and didn't any change your init code. BTW I updated my code. – Farshid Shekari Jan 06 '22 at 20:21
  • @AaronBertrand I added another solution based on your blog. if you have any idea about it, I am eager to hear from you. – Farshid Shekari Jan 06 '22 at 20:24
  • The updated query works much better. The only change is that partition by should be "Patient_Number" instead. it was giving me wrong patient totals with "location". Thanks again for your help! – Monks Den Jan 06 '22 at 20:48
  • @MonksDen I thought your grouping was based on Location as you said in your question cause of that I assumed that way, any time. – Farshid Shekari Jan 06 '22 at 20:52
  • When I use Location, the totals are all wrong. When I replace the partition by with Patient_Number, the totals are correct but the Medians are slightly off. When I keep what you have in your updated query and only remove "where ra between rd-1 and rd + 1 ", then totals are correct now but medians are off. Any ideas? I think some tweaking is needed? – Monks Den Jan 06 '22 at 21:48
  • @MonksDen your desired output is grouped by Location field, isn't it? – Farshid Shekari Jan 06 '22 at 21:55
  • @MonksDen which field do you want to use in your median calculation? and which field do you want to use in a grouping? – Farshid Shekari Jan 06 '22 at 22:00
  • Median should be WaitTime_Min and grouping is Location. I only changed it before because I was seeing the totals off. That fixed the totals but then as the data was coming in, I saw the medians were not correct. For example, if I use your query w/o any changes, I get total of only ONE WALK location patient, where in reality there are 13 according to the raw data. But when I replace that with the Patient_Number, the totals reflect the correct number but then Medians are off. – Monks Den Jan 06 '22 at 22:03
  • @MonksDen Maybe your WaitTime_Min field maybe have null values. – Farshid Shekari Jan 07 '22 at 07:40