0

I have a temporary table #input which holds an ID and a calculation date cdate. I have another temporary table #service which holds a service record for each ID. The field ‘countable’ on the #service table corresponds to whether a period of service counts or not. What I need to do is create a result table with a list of 1200 numbered days back from the date in the #input table for each ID and cdate combination, only including those where the given date falls within a period in #serv where countable = 1.

I can achieve this with a CTE and calendar table and ROW_NUMBER() as shown below, but when we are dealing with 1000s of IDs it takes a very long time, as the SERV table has data often going back decades for each ID, resulting in a huge CTE result set that is then filtered down to only include where RN <= 1200.

It seems like a solution using a cursor/while loop/recursive CTE might be preferable here in terms of performance, as processing could be stopped once the required 1100 rows are hit, or the beginning of the service is hit for a given ID & cdate?

I'm using SQL Server 2019.

Table setup scripts and method using cte & row_number & calendar table:


drop table if exists #input, #service, #calendar, #result
go

create table #input (id int, cdate date)
insert into #input
values (1,'31 mar 2022'), (2, '26 feb 2023'), (3, '01 apr 2023')

create table #service (id int, start_date date, end_date date, countable bit)
insert into #service values
(1,'15 Nov 1978','30 Jul 2008',1),
(1,'31 Jul 2008','31 Jul 2008',0),
(1,'01 Aug 2008','19 Aug 2008',1),
(1,'20 Aug 2008','20 Aug 2008',0),
(1,'21 Aug 2008','29 Jun 2011',1),
(1,'30 Jun 2011','30 Jun 2011',0),
(1,'01 Jul 2011','29 Aug 2012',1),
(1,'30 Aug 2012','30 Nov 2012',0),
(1,'01 Dec 2012','19 Mar 2013',1),
(1,'20 Mar 2013','20 Mar 2013',0),
(1,'21 Mar 2013','12 May 2013',1),
(1,'13 May 2013','14 May 2013',0),
(1,'15 May 2013','09 Jul 2014',1),
(1,'10 Jul 2014','10 Jul 2014',0),
(1,'11 Jul 2014','31 Mar 2015',1),
(1,'01 Apr 2015','16 Jul 2017',1),
(1,'17 Jul 2017','30 Jul 2017',0),
(1,'31 Jul 2017','15 Jul 2018',1),
(1,'16 Jul 2018','29 Jul 2018',0),
(1,'30 Jul 2018','28 Jul 2019',1),
(1,'29 Jul 2019','11 Aug 2019',0),
(1,'12 Aug 2019','02 Aug 2020',1),
(1,'03 Aug 2020','16 Aug 2020',0),
(1,'17 Aug 2020','31 Oct 2020',1),
(1,'01 Nov 2020','04 Jul 2021',1),
(1,'05 Jul 2021','18 Jul 2021',0),
(1,'19 Jul 2021','10 Jul 2022',1),
(1,'11 Jul 2022','24 Jul 2022',0),
(1,'25 Jul 2022','31 Jan 2023',1),
(1,'01 Feb 2023','01 Feb 2023',0),
(1,'02 Feb 2023','27 Apr 2023',1),
(1,'28 Apr 2023','28 Apr 2023',0),
(1,'29 Apr 2023','31 Dec 2999',1),
(2,'04 Jan 1984','28 Jan 2004',1),
(2,'29 Jan 2004','30 Jan 2004',0),
(2,'31 Jan 2004','30 Nov 2004',1),
(2,'01 Dec 2004','31 Jul 2006',1),
(2,'01 Aug 2006','29 Jun 2011',1),
(2,'30 Jun 2011','30 Jun 2011',0),
(2,'01 Jul 2011','29 Nov 2011',1),
(2,'30 Nov 2011','30 Nov 2011',0),
(2,'01 Dec 2011','19 Mar 2013',1),
(2,'20 Mar 2013','20 Mar 2013',0),
(2,'21 Mar 2013','06 Jun 2013',1),
(2,'07 Jun 2013','07 Jun 2013',0),
(2,'08 Jun 2013','09 Jul 2014',1),
(2,'10 Jul 2014','10 Jul 2014',0),
(2,'11 Jul 2014','14 Oct 2014',1),
(2,'15 Oct 2014','15 Oct 2014',0),
(2,'16 Oct 2014','30 Nov 2015',1),
(2,'01 Dec 2015','30 Jan 2017',1),
(2,'31 Jan 2017','31 Jan 2017',1),
(2,'01 Feb 2017','01 Oct 2022',1),
(2,'02 Oct 2022','08 Nov 2022',1),
(2,'09 Nov 2022','26 Feb 2023',0),
(3,'15 jan 2022','30 apr 2022',1),
(3,'01 may 2022','31 may 2022',0),
(3,'01 jun 2022','31 dec 2999',1)

create table #calendar(calendardate date primary key)

declare @dt as date = '01 jan 1970'

while @dt <= '31 dec 2030'
begin
    insert into #calendar values (@dt)
    set @dt = dateadd(d,1,@dt)
end

;with cte as (
    select row_number() over (partition by i.id, i.cdate order by calendardate desc) rn, i.id, i.cdate, calendardate
    from #input i 
    inner join #calendar c
        on c.calendardate <= i.cdate --only count where calendardate <= calculation date 
    inner join #service s 
        on s.id = i.id
        and c.calendardate between s.start_date and s.end_date
        and s.countable = 1 --calendardate within countable period of service
)
select *
into #result 
from cte
where rn <= 1200

I get the correct result using ROW_NUMBER and a calendar table, but I feel there could be a faster method that doesn't involve joining #service to #CALENDAR in it's entirety.

Dale K
  • 25,246
  • 15
  • 42
  • 71
vinstra_82
  • 36
  • 3
  • 7
    FYI a tally would be *way* faster than a `WHILE` loop for generating your dates. – Thom A Jul 07 '23 at 16:06
  • 2
    Another method to generate dates without a tally table or recursive CTE is a set-based `CROSS JOIN` like [this example](https://stackoverflow.com/questions/48776162/efficient-way-to-generate-2-billion-rows-in-sql-server-2014-developer). – Dan Guzman Jul 07 '23 at 16:30
  • This would be a simple join if you already had a calendar table. General purpose calendar tables are useful in so many ways because they typically include precalculated data for things like day of week, ISO weeks, quarters, etc. – AlwaysLearning Jul 08 '23 at 00:35

1 Answers1

1

You can try restructuring your query to:

  1. First join the #input and #service tables, limited to those service rows at or before the input cutoff date.
  2. Calculate a (possibly truncated) date range for each selected service row.
  3. Join to the #calendar table for dates within the calculated date range (between the calculated from and thru dates). This should yield a relatively efficient range seek into the calendar table.

Something like:

;with cte as (
    select
        row_number() over (partition by i.id, i.cdate order by calendardate desc) rn,
        i.id, i.cdate, c.calendardate
    from #input i 
    inner join #service s 
        on s.id = i.id
        and s.start_date <= i.cdate
        and s.countable = 1 -- countable period of service
    cross apply (
        select
            s.start_date as from_date,
            case when s.end_date <= i.cdate then s.end_date else i.cdate end as thru_date
    ) dr
    inner join #calendar c
        on c.calendardate between dr.from_date and dr.thru_date -- include dated within range
)
select *
into #result
from cte
where rn <= 1200

If this were SQL Server 2022 or later, you could eliminate the calendar and use the GENERATE_SERIES() function to expand the dates within each selected range. The LEAST() function also simplified the thru_date calculation.

-- SQL Server 2022 or later
;with cte as (
    select
        row_number() over (partition by i.id, i.cdate order by calendardate desc) rn,
        i.id, i.cdate, c.calendardate
    from #input i 
    inner join #service s 
        on s.id = i.id
        and s.start_date <= i.cdate
        and s.countable = 1 --calendardate within countable period of service
    cross apply (
        select
            s.start_date as from_date,
            least(s.end_date, i.cdate) as thru_date
    ) dr
    cross apply (
        select dateadd(day, s.value, dr.from_date) as calendardate
        from generate_series(0, datediff(day, dr.from_date, dr.thru_date)) s
    ) c
)
select *
into #result
from cte
where rn <= 1200

See this db<>fiddle for a working demo.

For the given test data, the query select id, count(*) cnt FROM #result group by id order by id yields the following summary for the original and both versions of the code above.

id cnt
1 1200
2 1200
3 411

I recommend that you examine the execution plan both the original and suggested alternatives above to determine if one shows fewer intermediate rows processed than the other. You will likely need a much larger set of test data to see any significant performance differences.

T N
  • 4,322
  • 1
  • 5
  • 18