0

I am trying to find the monthly median of my category EVS. To do this I need to summarize all the events of each logical device name. However, This needs to be done on a monthly basis. I have two queries the first one, which I will paste below summarizes all the events for that robot on a daily basis but I need to further summarize to make it a monthly basis. Also I need to do this on a monthly basis over multiple years.

select logicalDeviceName, Sum(Events) as consolidatedEvents, EVS, StartDate
From report.DisinfectionStatsCombined
group By LogicalDeviceName, EVS, StartDate
Order By EVS 

The outcome is this

This is a sample of my outcome, as you can see I need a single row to be apollo with a sum of the all the consolidated events for that one month

Furthermore, here is my code for attempting to calculate the median with a sample of the outcome.

select logicalDeviceName, Sum(Events) as consolidatedEvents, EVS, StartDate,
PERCENTILE_CONT(0.5) within group (order by Sum(Events))
OVER (Partition BY year(StartDate),Month(StartDate),EVS) AS MedianCont
From report.DisinfectionStatsCombined
group By LogicalDeviceName, EVS, StartDate,Events
Order By EVS 

As you can see the outcome is far from desirable and the consolidated events numbers change to be incorrect. I think the numbers from the first query are correct but then the change significantly here, why is that?

Again, I am trying to calculate the median, by summarizing all the consolidated events by each logical device name. So per month there should only be one of each name. Then, I will use this information to calculate median for each evs on a monthly basis. There is 4 evs types, I will provide a snip of another type. I feel I might have to change the grouping, in order to show the different types of evs. Ideally, I would like to only have it grouped by month/year. the dates go all the way back to 2012.

picture to show example of the other evs

Below is a sample of what I want my desired outcome to be sample outcome and data

tonip
  • 13
  • 2

1 Answers1

0

My understanding of your goals:

  1. Calculate the count of events per LogicalDeviceName each month
  2. Calculate the median number of events each month for each LogicalDeviceName
  3. Calculate the median number of events each month for each EVS

My working assumption is that for any given LogicalDeviceName there is only one matching EVS. For example, when the LogicalDeviceName value is "Apollo", the EVS value will always be "Commercial".

Note that I created and populated test table to get screenshots below. Script included below.

Goal 1: Aggregate the count of events per LogicalDeviceName per month to answer questions like "how many events occurred for Apollo in January 2015?". A query like this should do the trick. Separating out the year and month parts from the day allows the month aggregate:

SELECT 
      LogicalDeviceName
    , SUM(Events) [ConsolidatedEvents]
    , EVS
    , FORMAT(StartDate, 'yyyy/MM') [YearAndMonth]
FROM [Test]
GROUP BY 
      LogicalDeviceName
    , EVS
    , FORMAT(StartDate, 'yyyy/MM')
ORDER BY 
      YearAndMonth DESC
    , LogicalDeviceName

Screenshot of above query results

Goal 2: Calculate the median number of events each month for each LogicalDeviceName

SELECT DISTINCT
      LogicalDeviceName
    , PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Events) 
        OVER (PARTITION BY LogicalDeviceName, YEAR(StartDate), MONTH(StartDate)) [Median]
    , FORMAT(StartDate, 'yyyy/MM') [YearAndMonth]
FROM [Test]
ORDER BY [YearAndMonth] DESC, LogicalDeviceName

Screenshot of above query results

Need to use DISTINCT because PARTITION BY includes a row for each record. Also note that while this syntax is easier, there are potentially more performant ways to calculate median.

Goal 3: Calculate the median number of events each month for each EVS using the "consolidated" (SUM) of events for each LogicalDeviceName

This is where I'm again a bit foggy about what you're trying to accomplish. Will update if further clarification provided. Edit below based on provided screenshot of before/after sets. Off the top of my head, CTE building off of the first query is easiest way to accomplish:

WITH Consolidated AS
(
    SELECT 
          LogicalDeviceName
        , SUM(Events) [ConsolidatedEvents]
        , EVS
        , FORMAT(StartDate, 'yyyy/MM') [YearAndMonth]
    FROM [Test]
    GROUP BY 
          LogicalDeviceName
        , EVS
        , FORMAT(StartDate, 'yyyy/MM')
)
SELECT DISTINCT
    EVS
    , PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ConsolidatedEvents)
        OVER (PARTITION BY EVS, YearAndMonth) [Median]
    , YearAndMonth
FROM Consolidated
ORDER BY YearAndMonth DESC, EVS

Results of above query

Test Table Create/Populate Script:

SET NOCOUNT ON
GO
CREATE TABLE [Test]
(
      LogicalDeviceName VARCHAR(64)
    , Events INT
    , EVS VARCHAR(16)
    , StartDate DATETIME2
)

GO

DECLARE @LDN_EVS_Pairs TABLE
(
      LDN VARCHAR(16)
    , EVS VARCHAR(16)
)

INSERT INTO @LDN_EVS_Pairs(LDN, EVS)
VALUES
      ('Apollo', 'Commercial')
    , ('Appleton1', 'Commercial')
    , ('Baptist Beaches', 'Sodexo')
    , ('Florida Hospital', 'Commercial')
    , ('FROST', 'VA/DoD')
    , ('FVAMC1', 'VA/DoD')
    , ('GERMN8R', 'Commercial')
    , ('Glady', 'Commercial')
    , ('Sheldon', 'Sodexo')

DECLARE 
      @Counter INT = 10000
    , @Multiplier INT
    , @CurrentLDN VARCHAR(16)
    , @CurrentEvents INT
    , @CurrentEVS VARCHAR(16)
    , @CurrentStartDate DATETIME2
    , @MinEvents INT = 0
    , @MaxEvents INT = 50
    , @MinDate DATE = '20120101'
    , @MaxDate DATE = '20200707'

WHILE (@Counter > 0)
BEGIN
    SELECT TOP(1)
          @Multiplier = ABS(CHECKSUM(NEWID()) % (@MaxEvents/2 - @MinEvents + 1)) + @MinEvents
        , @CurrentLDN = LDN
        , @CurrentEvents = ABS(CHECKSUM(NEWID()) % (@MaxEvents - @MinEvents + 1)) + @MinEvents
        , @CurrentEVS = EVS
        , @CurrentStartDate = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % (1+DATEDIFF(DAY,@MinDate,@MaxDate)),@MinDate)
    FROM @LDN_EVS_Pairs
    ORDER BY NEWID()

    WHILE(@Multiplier > 0)
    BEGIN
        INSERT INTO [Test](LogicalDeviceName, Events, EVS, StartDate)
        VALUES(@CurrentLDN, @CurrentEvents, @CurrentEVS, @CurrentStartDate)
        
        SET @Multiplier -= 1
    END
    SET @Counter -= 1
END

A little help was provided from this SO Post for generating random values.

Ruben Helsloot
  • 12,582
  • 6
  • 26
  • 49
Collin
  • 176
  • 3
  • Hi, thank you for attempting to answer. I might have done a poor job explaining. But what I am trying to do is the following.1) I need to calculate the consolidated events per logical device name for each month in every year inside the data set. There will be instances from january 2012 to january 2020. 2) after accomplishing this, I need to get the median of those consolidated events for each of the 4 evs providers for each month in every year in the dataset. Also month and year need to be in the same column. – tonip Jul 07 '20 at 21:18
  • But thank you so much this in and of itself was very helpful!!!!! – tonip Jul 07 '20 at 21:27
  • the first table you made was my desired outcome, from that point I would like to calculate the median of the consolidatedevents for each evs for each month. In otherwords my final outcome needs to be the median of the evs for that month/year. So I want to pick the logical device number that is in the middle of that group by evs for that one month. – tonip Jul 07 '20 at 21:37
  • I edited to have year and month date parts in one column. Regarding final outcome, I'm still struggling to understand. Please provide small starting example dataset and desired outcome from that dataset to help understanding. – Collin Jul 07 '20 at 22:20