0

I have a school graduation data set by year and subgroup and have been provided the numerator and denominator and the single year graduation rate but I also need to calculate a 3 year moving average. I was advised by a statistician that no longer works with us that to do this I needed to get the running total for the numerator for 3 years and the running total for 3 years for the denominator. I understand the math behind it and have checked my work by hand and via excel with a few subgroups. I have also calculated this using T-SQL with no problem so long as there are no null records but I’m struggling with the calculation when there are nulls or 0.

I have tried running the query accounting for null by using NULLIF

   ID,
   Bldg,
   GradClass,
   Sbgrp ,
   TGrads,
   TStus,
   Rate,
/*Numerator Running total*/
   SUM (TGrads) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS NumSum,
/*Denominator Running Total*/
   SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS DenSum,
/*Moving Year Average*/
   (
       ( SUM ( TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) / NULLIF ( ( SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ), 0 ) * 100 
   ) AS 3yrAvg 
FROM
   KResults.DGSRGradBldg

First question, I was provided a record for all subgroups even if they didn’t have students in the subgroup. I want to keep the record so that all subgroups are accounted for within the district and since I know that they didn’t have data, can I substitute the Null values in Tgrads, TStus with a 0? If I do substitute those values with a 0 how can I show the rate as null?

Second question how can I compute the rate with either a null or 0 denominator? I understand you can’t divide by 0 but I want to maintain the record so it’s easy and clear to see that they had no data. How can I do this? When I try to calculate this without accounting for Null I get errors, 1.)Divide by zero error encountered. (8134) and 2.) Null value is eliminated by an aggregate or other SET operation. (8153).

Knowing I can’t divide by 0 or Null I modified my query to include NULLIF and when I do that the query runs with no errors but I don’t get accurate percentage for rates that are below 100%. All my rates are now either 100% or 0 - note the last row, the moving average of 2/3 is not 0.

Here’s what the data looks like if I try to account for nulls my Moving three year average shows as 0. Note the Moving three year Avg Column shows all 0.

ID   Bldg    Class   Sbggrp  TGrads  TStus      Rate    NumSum DenSum  3yrAvg  
A    1      2014       A1    46      49         93.9    46     49       0
A    1      2015       A1    41      46         89.1    87     95       0
A    1      2016       A1    47      49         95.9    134    144      0
A    1      2017       A1    38      40         95.0    126    135      0
A    1      2018       A1    59      59         98.3    143    148      0
A    1      2014       A2    1       1          100     1      1        100
A    1      2015       A2                               1      1        100
A    1      2016       A2                               1      1        100
A    1      2017       A2    2       3          66.7    2      3        0
A    1      2018       A2    2       2          100     4      5        0

Any advice would be appreciated but please provide suggestions kindly to this newbie. Thanks for your time and help.

KP14
  • 3
  • 2
  • I can't figure out what you are actually asking. Perhaps you could ask another question that is a bit simpler. – Gordon Linoff Jun 15 '19 at 01:46
  • Its unclear to me as well. Appears that 3yrAvg NULLIF is resulting in only in 1 or 0 and then multiplying by 100 to generate results. – alexherm Jun 15 '19 at 02:06
  • Thanks for asking for clarification. I need to get the 3 year rate but if there are null records than my rate returns an error b/c it's dividing by null. So how can I get the rate if I have null records? I can't exclude the null records b/c folks I'm working for want to see all records regardless. I'm only trying to multiply at the end to get a percent. Forget the 3 year rate if it's easier - how can I take one column and divide by another and not throw an error if there are nulls? My current method isn't providing the results I need. – KP14 Jun 15 '19 at 03:39

1 Answers1

0

Answer to question 1: put in the select condition

ISNULL(TGrads,0) AS TGRADS,
ISNULL(TStus,0) AS TSTUS,

Answer to question 2: I'd do this

(CASE WHEN SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) IS NOT NULL
AND SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) <>0
THEN (SUM(TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row )  / (SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) ) * 100
ELSE NULL END
   ) AS 3yrAvg 

I put null after "ELSE"...You can choose your default value.

Gufus
  • 418
  • 2
  • 7
  • 15
  • 1
    Thanks for the help as your answer to number 1 worked well but answer to number 2 still doesn't produce the correct percents. For example if the running numerator is 3 and the running denominator is 3, than the moving average is 100% and that's accurate but when the percent is anything other than 100% it produces 0, such as 46/49 the percent shows as 0. How can I get the moving average of 46/49 as an example to = 93.87 instead of 0? – KP14 Jun 15 '19 at 23:15
  • the problem could be that both numerator and denominator are integers. Try to look here https://stackoverflow.com/questions/3443672/integer-division-in-sql-server – Gufus Jun 16 '19 at 07:44
  • ```@Gufus```I think you were spot on. The fields were stored as integer. When setting to float the math works. Is there a way to set the precision and scale on float division? I'd like only 2 digits behind the decimal and I'm getting 13. Also, is there any reason why this query would run so slow (41 seconds) with only 34000 records? – KP14 Jun 17 '19 at 18:42
  • I'm not an expert in choosing the best numeric format, but reading this https://stackoverflow.com/questions/1056323/difference-between-numeric-float-and-decimal-in-sql-server I think you should use decimal (decimal(9,2) seems good!). I hope it helps to get the query faster, but even for few records you can have slow queries expecially if you use order by conditions. Have you tried to use a temporary table? Maybe it gives some benefits... – Gufus Jun 17 '19 at 20:43