1

I have a table I'm trying to create that has a column that needs to be calculated based on the row above it multiplied by the previous column. The first row is defaulted to 100,000 and the rest of the rows would be calculated off of that. Here's an example:

Age Population Deaths DeathRate DeathPro DeathProb SurvivalProb PersonsAlive
0 1742 0 0 0.1 0 1 100,000
51 2048 1 0.00048 0.5 0.00048 0.99951 99951.18379
52 1921 0 0 0.5 0 1 99951.18379
61 1965 1 0.00051 0.5 0.00051 0.99949 99900.33

I skipped some ages so I didn't have type it all in there, but the ages go from 0 - 85. This was orginally done in excel and the formula for PersonsAlive (which is what I'm trying to recreate) was G3*H2 aka previous value of PersonsAlive * Survival Probability.

I was thinking I could accomplish this with the lag function, but with the example I provided above, I get null values for everything after age 1 because there is no value in the previous row. What I want to happen is that PersonsAlive returns 100,000 until I get a death (in the example at Age 51) and then it does the calculation and returns the value (99951) until another death happens (Age 61). Here's my code, which includes two extra columns, ZipCode (the reason we want to do it in SQL is so we can calculate all zips at once) and PersonsAliveTemp, which I used to set Age 0 to 100,000:

SELECT 
    ZipCode
    ,Age
    ,[Population]
    ,Deaths
    ,DeathRate
    ,Death_Proportion
    ,DeathProbablity
    ,SurvivalProbablity
    ,PersonsAliveTemp
    ,(LAG(PersonsAliveTemp,1) OVER(PARTITION BY ZipCode ORDER BY Age))*SurvivalProbablity as PersonsAlive
FROM #temp4

I also tried it with defaulting PersonsAliveTemp to 100,000 and 0, which "works" but doesn't do the running calculation.

Is it possible to get the lag function (or some other function) to do a running row by row calc?

Dale K
  • 25,246
  • 15
  • 42
  • 71

1 Answers1

1

This converts a running product into an addition via logarithms.

select *,
    100000 * exp(sum(log(SurvivalProb)) over
                    (partition by ZipCode order by Age
                     rows between unbounded preceding and current row)
                ) as PersonsAlive
from data
order by Age;

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=36be4d66260c74196f7d36833018682a

shawnt00
  • 16,443
  • 3
  • 17
  • 22
  • An explanation of how a running multiplication works would be useful. You probably also want `rows unbounded preceding, otherwise you will get incorrect results if there are duplicate `Age` values (it's also faster) – Charlieface Aug 24 '22 at 06:17
  • @Charlie I was hoping for a little feedback on whether this was even the right direction. But yes I should get in the habit of specifying a rows window in general. – shawnt00 Aug 24 '22 at 06:19
  • You can get in trouble with this if negative values are involved, but generally it's a good approach. You also need to make sure your data type has enough precision – Charlieface Aug 24 '22 at 13:07
  • Thanks @shawnt00, but that's not quite what I'm looking for. I'm trying to accomplish a running formula, not running multiplication, so it looks like I'll have to figure out how to do a recursive CTE. Only Age 0 should have the PersonsAlive = 100,000, after that each age is the previous age's PersonsAlive multiplied by that age's SurvivalProb – Lucas Sienk Aug 24 '22 at 19:45
  • @Lucas Your iterated formula is just a series of multiplications. The values match up with itif you take a look at the link. – shawnt00 Aug 24 '22 at 19:48