0

I am trying to calculate age-range based on Customer's date-of-birth. Data for this source comes into staging, where I call this following stored procedure. And a batch size generally is 500K to 2M here.

 CREATE PROCEDURE [spCustomer_UpdateStageAgeRange]
 AS
 BEGIN
    DECLARE @Today date = getdate()

    update stg.tCustomer_Merge set [CALC_Age_Range] = 
        CASE 
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 13 THEN '<13'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 18 THEN '13-17'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 22 THEN '18-21'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 26 THEN '22-25'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 34 THEN '26-33'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) < 45 THEN '34-44'
            WHEN datediff(Year, CAST([DOB] AS date), @Today) >= 45 THEN '45+'
            ELSE 'Unknown'
        END
End

Problem : Even for some valid DOB, this stored procedure calculates age-range as 'Unknown'. We are not able to re-generate this for smaller data set, so is this a big volume problem? Or any other reason?

Context: In 28M records, 230K are having 'Unknown'

Note: We are using Azure-sql-server. DOB is a date field in table stg.tCustomer_Merge. Manual runs of a DOB works, it's only in large batches we see this issue.

Shanta
  • 262
  • 1
  • 4
  • 15
  • can you share some of your sample data which works out the case as "unknown"? Is the DOB field a type of Date as well? – Harry Mar 31 '20 at 03:56
  • Yes I have mentioned that it is date field. Sample -1979-06-23 Unknown 1991-11-18 Unknown 1971-05-20 Unknown 1978-11-12 Unknown 1973-04-09 Unknown 1978-02-08 Unknown 1967-10-10 Unknown 1983-05-13 Unknown 1980-12-14 Unknown 1961-06-23 Unknown – Shanta Mar 31 '20 at 04:14
  • OK.. so if DOB is already a date.. why are you casting it as a date? Oddly enough, on my testing.. those sample dates seem to work just fine with the case statement. Could be something to do with Azure perhaps? Can you try casting your @Today date = cast(getdate() as date) if that makes any difference? – Harry Mar 31 '20 at 19:34
  • To cast was an attempt to fix the inconsistency. Yes, it works manually run. the problem is when it runs in batch as mentioned in description. – Shanta Apr 01 '20 at 14:10
  • Just as an aside, are you aware of how `DATEDIFF` works with date boundaries. E.g. `DATEDIFF(year,'20191231','20200101')` returns 1 - despite that person only being 1 day old. Is that what you want? It's often easier to work backwards (add negative numbers of years to `@Today` and compare that to the DOB) – Damien_The_Unbeliever Apr 01 '20 at 14:22
  • Yep. See https://stackoverflow.com/questions/1572110/how-to-calculate-age-in-years-based-on-date-of-birth-and-getdate – David Browne - Microsoft Apr 01 '20 at 14:26
  • We have corrected the year calculation using 'datediff(DAY, [DOB], @Today))/365.25' But the error of 'Unknown' still appears for some of valid dates. the question is about volume of data, when large data is processed, few hundred records are not calculated correctly ! is it a limitation of SQL server ? – Shanta Aug 10 '20 at 18:28
  • This did not get solved by Microsoft too. We moved our age-range calculation to Databricks and now consuming the calculated value directly into SQL – Shanta Sep 04 '20 at 16:57

0 Answers0