Why is this CTE so much slower than using temp tables?

Question

We had an issue since a recent update on our database (I made this update, I am guilty here), one of the query used was much slower since then. I tried to modify the query to get faster result, and managed to achieve my goal with temp tables, which is not bad, but I fail to understand why this solution performs better than a CTE based one, which does the same queries. Maybe it has to do that some tables are in a different DB ?

Here's the query that performs badly (22 minutes on our hardware) :

WITH CTE_Patterns AS (
SELECT 
    PEL.iId_purchased_email_list,
    PELE.sEmail
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
    INNER JOIN OtherDb.dbo.Purchased_Email_List_Email AS PELE WITH(NOLOCK) ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
),
CTE_Emails AS (
    SELECT 
        ILE.iId_newsletterservice_import_list, 
        ILE.iId_newsletterservice_import_list_email, 
        ILED.sEmail
    FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
        INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED WITH(NOLOCK) ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
    WHERE ILE.iId_newsletterservice_import_list = 1000
)
SELECT I.iId_newsletterservice_import_list, 
        I.iId_newsletterservice_import_list_email, 
        BL.iId_purchased_email_list
FROM CTE_Patterns AS BL WITH(NOLOCK)
    INNER JOIN CTE_Emails AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail

When running both CTE queries separately, it's super fast (0 secs in SSMS, returns 122 rows and 13k rows), when running the full query, with INNER JOIN on sEmail, it's super slow (22 minutes)

Here's the query that performs well, with temp tables (0 sec on our hardware) and which does the eaxct same thing, returns the same result :

SELECT
    PEL.iId_purchased_email_list,
    PELE.sEmail
INTO #tb1
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
    INNER JOIN OtherDb.dbo.Purchased_Email_List_Email PELE ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1

SELECT 
    ILE.iId_newsletterservice_import_list, 
    ILE.iId_newsletterservice_import_list_email, 
    ILED.sEmail
INTO #tb2
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
    INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000

SELECT I.iId_newsletterservice_import_list, 
        I.iId_newsletterservice_import_list_email, 
        BL.iId_purchased_email_list
FROM #tb1 AS BL WITH(NOLOCK)
    INNER JOIN #tb2 AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail

DROP TABLE #tb1
DROP TABLE #tb2

Tables stats :

OtherDb.dbo.Purchased_Email_List : 13 rows, 2 rows flagged bPattern = 1
OtherDb.dbo.Purchased_Email_List_Email : 324289 rows, 122 rows with patterns (which are used in this issue)
dbo.NewsletterService_import_list_email : 15.5M rows
dbo.NewsletterService_import_list_email_distinct ~1.5M rows
WHERE ILE.iId_newsletterservice_import_list = 1000 retrieves ~ 13k rows

I can post more info about tables on request.

Can someone help me understand this ?

UPDATE

Here is the query plan for the CTE query :

Execution plan with CTE

Here is the query plan with temp tables :

Execution plan with temp tables

@TimSchmelter I understand the point, but is the query optimizer making kinda bad job in the case of CTEs like this one ? In my sense, they should behave like small temp tables since I include a `WHERE` clause in each CTE that makes the set mush smaller. — MaxiWheat, Jun 10 '14 at 14:29
Sometimes the complexity of stacked CTEs just creates too many opportunities for the optimizer, and it can't reach one efficiently that would be best to satisfy the whole query. Sometimes out of date stats contribute to this, leading to bad choices. Dumping intermediate results into a #temp table can help by (a) reducing the complexity of any individual query and (b) providing real and accurate stats on subsets of the data. — Aaron Bertrand, Jun 10 '14 at 15:33
Also, instead of putting `WITH (NOLOCK)` on every table, set it at the session level - or better yet, use RCSI. — Aaron Bertrand, Jun 10 '14 at 15:33
i'll help, but only if you remove my email address from your database — swasheck, Jun 10 '14 at 15:39
Don't worry @swasheck, we don't sell lists and emails, we kick out clients who seems to have bought a list ;-) (this is what this query does, matches lists to patterns to reject them) — MaxiWheat, Jun 10 '14 at 15:43
I know this is old, but anyone else notices the Hash match and table scan in the CTE query plan? This indicates the lack of indexes required for the query. I bet that with correct indexes both queries will perform identically — nflash, Feb 05 '18 at 17:36

score 3 · Accepted Answer · answered Jun 11 '14 at 00:22

As you can see in the query plan, with CTEs, the engine reserves the right to apply them basically as a lookup, even when you want a join.

If it isn't sure enough it can run the whole thing independently, in advance, essentially generating a temp table... let's just run it once for each row.

This is perfect for the recursion queries they can do like magic.

But you're seeing - in the nested Nested Loops - where it can go terribly wrong.
You're already finding the answer on your own by trying the real temp table.

CodeCowboyOrg · Answer 2 · 2014-08-08T05:08:47.490

Parallelism. If you noticed in your TEMP TABLE query, the 3rd Query indicates Parallelism in both distributing and gathering the work of the 1st Query. And Parallelism when combining the results of the 1st and 2nd Query. The 1st Query also incidentally has a relative cost of 77%. So the Query Engine in your TEMP TABLE example was able to determine that the 1st Query can benefit from Parallelism. Especially when the Parallelism is Gather Stream and Distribute Stream, so its allowing the divying up of work (join) because the data is distributed in such a way that allows for divying up the work then recombining. Notice the cost of the 2nd Query is 0% so you can ignore that as no cost other than when it needs to be combined.

Looking at the CTE, that is entirely processed Serially and not in Parallel. So somehow with the CTE it could not figure out the 1st Query can be run in Parallel, as well as the relationship of the 1st and 2nd query. Its possible that with multiple CTE expressions it assumes some dependency and did not look ahead far enough.

Another test you can do with the CTE is keep the CTE_Patterns but eliminate the CTE_Emails by putting that as a "subquery derived" table to the 3rd Query in the CTE. It would be curious to see the Execution Plan, and see if there is Parallelism when expressed that way.

score 1 · Answer 3 · answered Jun 10 '14 at 21:41

1

In my experience it's best to use CTE's for recursion and temp tables when you need to join back to the data. Makes for a much faster query typically.

answered Jun 10 '14 at 21:41

SDr6

21
3

Why is this CTE so much slower than using temp tables?

3 Answers3

Linked