-1

In SQL Server, I have the following table (snippet) which is the source data I receive (I cannot get the raw table it was generated from).

  • Gradelevel | YoS | Inventory
  • 4 | 0 | 4000
  • 4 | 1 | 3500
  • 4 | 2 | 2000

The first row of the table is saying for grade level 4, there are 4,000 people with 0 years of service (YoS).

I need to find the median YoS for each Grade level. This would be easy if the table wasn't given to me aggregated up to the Gradelevel/YoS level with a sum in the Inventory column, but sadly I'm not so lucky.

What I need is to ungroup this table such that I have a new table where the first record is in the table 4,000 times, the next record 3,500 times, the next 2,000, etc (the inventory column would not be in this new table). Then I could take the percent_disc() of the YoS column by grade level and get the median. I could also then use other statistical functions on YoS to glean other insights from the data.

So far I've looked at unpivot (doesn't appear to be a candidate for my use case), CTEs (can't find an example close to what I'm trying to do), and a function which iterates through the above table inserting the number of rows indicated by the value in inventory to a new table which becomes my 'ungrouped' table I can run statistical analyses on. I believe the last approach is the best option available to me but the examples I've all seen iterate and focus on a single column from a table. I need to iterate through each row, then use the gradelevel, and yos values to insert [inventory] number of times before moving on to the next row.

Is anyone aware of:

  1. A better way to do this other then the iteration/cursor method?
  2. How to iterate through a table to accomplish my goal? I've been reading Is there a way to loop through a table variable in TSQL without using a cursor? but am having a hard time figuring out how to apply that iteration to my use case.

Edit 10/3, here is the looping code I got working which produces the same as John's cross apply. Pro is any statistical function can then be run on it, con is it is slow.


    --this table will hold our row (non-frequency) based inventory data
    DROP TABLE IF EXISTS #tempinv
CREATE TABLE #tempinv(
amcosversionid INT NOT null,
pp NVARCHAR(3) NOT NULL,
gl INT NOT NULL,
yos INT NOT NULL

)

-- to transform the inventory frequency table to a row based inventory we need to iterate through it
DECLARE @MyCursor CURSOR,  @pp AS NVARCHAR(3), @gl AS INT, @yos AS INT, @inv AS int
BEGIN
    SET @MyCursor = CURSOR FOR
    SELECT payplan, gradelevel, step_yos, SUM(inventory) AS inventory 
   FROM  
  mytable
  GROUP BY payplan, gradelevel, step_yos     


    OPEN @MyCursor 
    FETCH NEXT FROM @MyCursor 

    INTO  @pp, @GL, @yos, @inv

    WHILE @@FETCH_STATUS = 0
    BEGIN
      DECLARE @i  int 
      SET @i = 1
      --insert into our new table for each number of people in inventory
      WHILE @i<=@inv
        BEGIN

            INSERT INTO #tempinv (pp,gl,yos) VALUES (@pp,@gl,@yos)
            SET @i = @i + 1
        END 
      FETCH NEXT FROM @MyCursor 
      INTO  @pp, @GL, @yos, @inv 
    END; 
boondocks5
  • 31
  • 1
  • 7
  • I dont understand the ask - If you are required to calculate the median, by unpivoting you would have 9500 rows for your sample, and the Inventory weight would be just 1 for each row. Now the median for 9500 rows would be at 4750 and assuming you have your data sorted by YoS, the median YoS should be 1. If this logic has to be followed then you can simply do the calculation without unpivoting. – Neels Oct 01 '19 at 21:33
  • You can definitely find the median without doing what you describe. Just think that YoS 0 has a range from 0-4000, YoS 1 range starts at 4001 - 7500 and YoS 2 range starts at 7501 - 9500. As @Neels says, the mediam is at 4750 and 4750 falls withing YoS 1's range of numbers. If you really want to do something like recursive CTE, then do it with start/end range numbers instead of all of the rows. – avery_larry Oct 01 '19 at 21:53
  • I guess my problem is I don't understand how to do the calculation without ungrouping/unpivoting the data. I thought a recursive CTE might be an elegant solution but I'm not sure how one would go about doing that. I edited my post to show the looping code that works but it is slow. I like the concept you two have but 1) how would one go about doing that and 2) could other statistical functions (e.g. other percentiles, stdev, etc) be used on it as well? – boondocks5 Oct 03 '19 at 12:29

1 Answers1

0

One Option is to use an CROSS APPLY in concert with an ad-hoc tally table. This will "expand" your data into N rows. Then you can perform any desired analysis you want.

Example

Select * 
 From YourTable A
 Cross Apply (
              Select Top ([Inventory]) N=Row_Number() Over (Order By (Select NULL)) 
               From  master..spt_values n1, master..spt_values n2
             ) B 

Returns

Grd Yos Inven   N
4   0   4000    1
4   0   4000    2
4   0   4000    3
4   0   4000    4
4   0   4000    5
...
4   0   4000    3998
4   0   4000    3999
4   0   4000    4000
4   1   3500    1
4   1   3500    2
4   1   3500    3
4   1   3500    4
...
4   1   3500    3499
4   1   3500    3500
4   2   2000    1
4   2   2000    2
4   2   2000    3
...
4   2   2000    1999
4   2   2000    2000
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • John, that definitively ungroups the data and is definetely shorthand for the insert loop I was doing below. But what 'master..spt_values'? I've never seen that before – boondocks5 Oct 03 '19 at 12:19
  • @boondocks5 just an internal table in SQL Server. Actually any table of adequate size will do. – John Cappelletti Oct 03 '19 at 12:22