0

I recently asked a question regarding CTE's and using data with no true root records (i.e Instead of the root record having a NULL parent_Id it is parented to itself)

The question link is here; Creating a recursive CTE with no rootrecord

The answer has been provided to that question and I now have the data I require however I am interested in the difference between the two approaches that I THINK are available to me.

The approach that yielded the data I required was to create a temp table with cleaned up parenting data and then run a recursive CTE against. This looked like below;

Select CASE
    WHEN Parent_Id = Party_Id THEN NULL
    ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
INTO #Parties
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),

WITH linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM #Parties
WHERE Act_Parent_Id IS NULL

UNION ALL

Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM #Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)

Select *
FROM linkedParties
Order By Level

I also attempted to retrieve the same data by defining two CTE's. One to emulate the creation of the temp table above and the other to do the same recursive work but referencing the initial CTE rather than a temp table;

WITH Parties
AS
(Select CASE
    WHEN Parent_Id = Party_Id THEN NULL
    ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),

linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM Parties
WHERE Act_Parent_Id IS NULL

UNION ALL

Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)

Select *
FROM linkedParties
Order By Level

Now these two scripts are run on the same server however the temp table approach yields the results in approximately 15 seconds.

The multiple CTE approach takes upwards of 5 minutes (so long in fact that I have never waited for the results to return).

Is there a reason why the temp table approach would be so much quicker?

For what it is worth I believe it is to do with the record counts. The base table has 200k records in it and from memory CTE performance is severely degraded when dealing with large data sets but I cannot seem to prove that so thought I'd check with the experts.

Many Thanks

Community
  • 1
  • 1

2 Answers2

0

Well as there appears to be no clear answer for this some further research into the generics of the subject threw up a number of other threads with similar problems.

This one seems to cover many of the variations between temp table and CTEs so is most useful for people looking to read around their issues;

Which are more performant, CTE or temporary tables?

In my case it would appear that the large amount of data in my CTEs would cause issue as it is not cached anywhere and therefore recreating it each time it is referenced later would have a large impact.

Community
  • 1
  • 1
0

This might not be exactly the same issue you experienced, but I just came across a few days ago a similar one and the queries did not even process that many records (a few thousands of records).

And yesterday my colleague had a similar problem.

Just to be clear we are using SQL Server 2008 R2.

The pattern that I identified and seems to throw the sql server optimizer off the rails is using temporary tables in CTEs that are joined with other temporary tables in the main select statement.

In my case I ended up creating an extra temporary table.

Here is a sample.

I ended up doing this:

SELECT DISTINCT st.field1, st.field2
  into #Temp1
FROM SomeTable st
WHERE st.field3 <> 0

select x.field1,  x.field2
FROM #Temp1 x inner join #Temp2 o 
    on x.field1 = o.field1
order by 1, 2

I tried the following query but it was a lot slower, if you can believe it.

with temp1 as (
 DISTINCT st.field1, st.field2
    FROM SomeTable st
    WHERE st.field3 <> 0
)
select x.field1,  x.field2
FROM temp1 x inner join #Temp2 o 
    on x.field1 = o.field1
order by 1, 2

I also tried to inline the first query in the second one and the performance was the same, i.e. VERY BAD.

SQL Server never ceases to amaze me. Once in a while I come across issues like this one that reminds me it is a microsoft product after all, but in the end you can say that other database systems have their own quirks.

boggy
  • 3,674
  • 3
  • 33
  • 56