Slow performance on Recursive CTE in T-SQL

Question

I have the following code, (based on the community member help):

      use [Credible];

      WITH DataSource AS
      (
       SELECT
             ROW_NUMBER() OVER(ORDER BY epi.[client_id]) AS [row_id]
            ,epi.[client_id]
            ,CONVERT(date, epi.[admission_date]) AS [admission_date]
            ,CONVERT(date, DATEADD(MONTH, 3, epi.[admission_date])) AS [3Month Date]
            ,CONVERT(date, ISNULL(epi.[discharge_date], GETDATE())) AS [discharge_date]

       FROM 
           [dbo].[ClientEpisode] epi

       WHERE DATEADD(MONTH, 3, [admission_date]) <= ISNULL([discharge_date], GETDATE()) 
  ),
    RecursiveDataSource AS
     (
      SELECT 
            [row_id]
           ,[client_id]
           ,[admission_date]
           ,[3Month Date]
           ,[discharge_date]
           ,0 AS [level]

     FROM 
         DataSource

         UNION ALL

     SELECT 
          ds.[row_id]
         ,ds.[client_id]
         ,ds.[admission_date]
         ,DATEADD(MONTH, 3, rds.[3Month Date])
         ,ds.[discharge_date]
         ,[level] + 1

    FROM 
          RecursiveDataSource rds
          INNER JOIN DataSource ds ON 
          rds.[row_id] = ds.[row_id] AND DATEADD(MONTH, 3, rds.[3Month Date]) < ds.[discharge_date]
    )

     SELECT *

     FROM RecursiveDataSource 

     ORDER BY [row_id]
             ,[level]
     -- OPTION (MAXRECURSION 32767);

This code works with the speed around 30 sec if there are up to 1 000 records in the table.

But my table is over 14 000 records, and will grow more, and the code works 10+++ min

Is there way to make its performance in 30 sec or so?

Thank you for help

I think I would first take the query in DataSource and split it out into a temp table and then try running the CTE against that. — JMabee, Oct 21 '20 at 09:51
I answered this here but moved the answer to the original question as it was really a response to the original question (e.g., doing the calculation a different way, rather than optimising this CTE). I've put it in https://stackoverflow.com/questions/64383302/need-to-add-3-months-to-each-value-within-a-column-based-on-the-1st-3-months/64462525#64462525 — seanb, Oct 21 '20 at 11:26

gotqn · Accepted Answer · 2020-10-21T14:25:34.080

As here you are looking for performance solution, I will advice to skip using recursive CTE and solve the issue in another way.

For example, a variant of this:

DECLARE @DataSource TABLE
(
    [client_id] INT
   ,[adm_date] DATE
   ,[disch_date] DATE
);

INSERT INTO @DataSource ([client_id], [adm_date], [disch_date])
VALUES (1002, '3/11/2005 ', '5/2/2005')
      ,(1002, '8/30/2005 ', '2/16/2007')
      ,(1002, '3/16/2017 ', NULL);

DROP TABLE IF EXISTS #DataSource;
DROP TABLE IF EXISTS #DataNumers;

CREATE TABLE #DataSource
(
    [client_id] INT
   ,[adm_date] DATE
   ,[disch_date] DATE
   ,[diff_in_months] INT
);

CREATE TABLE #DataNumers
(
    [number] INT
);

DECLARE @max_diff_in_months INT;

INSERT INTO #DataSource ([client_id], [adm_date], [disch_date], [diff_in_months])
SELECT [client_id]
      ,[adm_date]
      ,ISNULL([disch_date], GETUTCDATE()) AS [disch_date]
      ,CEILING(DATEDIFF(MONTH, [adm_date], ISNULL([disch_date], GETUTCDATE())) * 1.0 / 3.0) - 1
FROM @DataSource
WHERE DATEADD(MONTH, 3, [adm_date]) <= ISNULL([disch_date], GETUTCDATE());

SELECT @max_diff_in_months = MAX([diff_in_months])
FROM #DataSource;

INSERT INTO #DataNumers ([number])
SELECT TOP (@max_diff_in_months) ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM [master]..[spt_values];

SELECT DS.[client_id]
      ,DS.[adm_date]
      ,DATEADD(MONTH, DN.[number] * 3, [adm_date]) AS [3Month Date]
      ,DS.[disch_date]
FROM #DataSource DS
CROSS APPLY #DataNumers DN
WHERE DS.[diff_in_months] >= DN.[number];

The algorithm is the following:

select only the rows we needed
calculate the differences in months/3 for each row
generate numbers in helper table
generate the new rows using cross apply

You can use CROSS JOIN if you have more then 2.5k numbers to generate but I remove it in my example. Also, we are using [master]..[spt_values] to generate the total periods. But it's just one way for generating numbers in T-SQL. You can check the link and use another technique if you like.

The idea is to find the periods between each [adm_date] and [disch_date] - and a period is measure by 3 months - that's why we are using DATEDIFF with MONTH and then dividing by 3. We are using ceiling to get the upper value. Of course, you can generate more periods than needed and exclude them with WHERE clause in the final result.

If you include, say, `(1, '20010101', '20050115')` as a row in the datasource, it skips the last row (it should include 20050101). — seanb, Oct 21 '20 at 12:02
@seanb in the following row ,CEILING(DATEDIFF(MONTH, [adm_date], ISNULL([disch_date], GETDATE())) * 1.0 / 3.0) -1 if you remove "-1" it works! — Hell-1931, Oct 21 '20 at 12:47
@gotqn it worked perfectly! Questions about ,CEILING(DATEDIFF(MONTH, [adm_date], ISNULL([disch_date], GETDATE())) * 1.0 / 3.0) Could you explain it? And - what is [master]..[spt_values]? (I've found [[master].[dbo].spt_values in my DB) Greates thanks, as usual! — Hell-1931, Oct 21 '20 at 12:54
@Hell-1931 I explain in the edit. Could to share what's the performance now? — gotqn, Oct 21 '20 at 14:25
@gotqn And it absolutely worked - I just had to replace INSERT INTO values to INSERT INTO SELECT FROM myDBTable And - remove "-1" from CEILING(DATEDIFF(MONTH, [adm_date], ISNULL([disch_date], GETUTCDATE())) * 1.0 / 3.0) - 1 , so it would add up the 3M to the last row before dich-date > 3M date — Hell-1931, Oct 21 '20 at 20:25

Slow performance on Recursive CTE in T-SQL

1 Answers1