295

In this excellent SO question, differences between CTE and sub-queries were discussed.

I would like to specifically ask:

In what circumstance is each of the following more efficient/faster?

  • CTE
  • Sub-Query
  • Temporary Table
  • Table Variable

Traditionally, I've used lots of temp tables in developing stored procedures - as they seem more readable than lots of intertwined sub-queries.

Non-recursive CTEs encapsulate sets of data very well, and are very readable, but are there specific circumstances where one can say they will always perform better? or is it a case of having to always fiddle around with the different options to find the most efficient solution?


EDIT

I've recently been told that in terms of efficiency, temporary tables are a good first choice as they have an associated histogram i.e. statistics.

DineshDB
  • 5,998
  • 7
  • 33
  • 49
whytheq
  • 34,466
  • 65
  • 172
  • 267
  • 7
    General answer: **it depends.** And it depends on some many factors, any general statement is likely false - in some situations. Basically: you need to test and measure - see which works best for you! – marc_s Jun 23 '12 at 14:24
  • 1
    @marc_s - ok; maybe this question should be closed for being subjective? Mind you a lot of SQL questions on SO could be judged as subjective. – whytheq Jun 23 '12 at 14:36
  • 3
    It might get closed as being too broad - and I agree with you - lots of things and topics in SQL really will get an answer of **it depends**. Sometimes one can list two or three criteria to make a decision, but with your question here, it's just next to impossible to give sound advice - it depends on so much - your table structures, data in those tables, queries you're using, your indexing strategy and much much more.... – marc_s Jun 23 '12 at 14:38
  • @marc_s it'd be good to try and keep - any advice on possible edits to OP to try to make it more specific and narrow ? – whytheq Jun 23 '12 at 14:47
  • Please note this question is specific to SQL Server. For other DBs like postgres, a CTE is often much slower than equivalent subqueries (see [http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/](http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/)) – Jay Oct 17 '15 at 00:00
  • @Jay Yet for some reason in practice, the *vast majority* of complex queries I encounter in actual business software run *much faster* written as CTEs in Postgres than subqueries. This still perplexes me, but its a happy coincidence since those are usually much more readable months later (or by new eyes). "... any general statement is likely false" indeed! – zxq9 Nov 16 '15 at 10:39
  • @Jay, that link is old (2014). Everyone else reading Jay's comment, take note. – Paul-Sebastian Manole Aug 07 '18 at 15:32
  • [This](https://www.mssqltips.com/sqlservertip/5118/sql-server-cte-vs-temp-table-vs-table-variable-performance-test/) might be of interest and is, at the time of writing (Aug 2019) relatively recent. – Vérace Aug 13 '19 at 14:29

4 Answers4

324

SQL is a declarative language, not a procedural language. That is, you construct a SQL statement to describe the results that you want. You are not telling the SQL engine how to do the work.

As a general rule, it is a good idea to let the SQL engine and SQL optimizer find the best query plan. There are many person-years of effort that go into developing a SQL engine, so let the engineers do what they know how to do.

Of course, there are situations where the query plan is not optimal. Then you want to use query hints, restructure the query, update statistics, use temporary tables, add indexes, and so on to get better performance.

As for your question. The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times. Unfortunately, SQL Server does not seem to take advantage of this basic optimization method (you might call this common subquery elimination).

Temporary tables are a different matter, because you are providing more guidance on how the query should be run. One major difference is that the optimizer can use statistics from the temporary table to establish its query plan. This can result in performance gains. Also, if you have a complicated CTE (subquery) that is used more than once, then storing it in a temporary table will often give a performance boost. The query is executed only once.

The answer to your question is that you need to play around to get the performance you expect, particularly for complex queries that are run on a regular basis. In an ideal world, the query optimizer would find the perfect execution path. Although it often does, you may be able to find a way to get better performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 12
    Some Microsoft Research on possible future improvements in this area is in the publication "Efficient Exploitation of Similar Subexpressions for Query Processing” [Available from here](http://research.microsoft.com/en-us/um/people/jrzhou/) – Martin Smith Jun 23 '12 at 18:43
  • 4
    Given that that paper was presented in 2007, any idea whether they have incorporated it in SQL Server 2012? – Gordon Linoff Jun 23 '12 at 19:20
  • 1
    I don't believe so. I haven't read anything to that effect unfortunately. – Martin Smith Jun 23 '12 at 21:03
  • 4
    A great answer! Just to emphasize: SQL is a declarative language, and we do not control HOW the data is pulled. Therefore, performance/speed varies from query to query. – Simcha Khabinsky Mar 31 '14 at 14:19
  • "The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer." This is not true in the specific case of postgres, because CTEs are optimization fences [as described here](http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/). I came to this question when a CTE-heavy query was taking too long, and it turned out using subqueries was several orders of magnitude faster. – Jay Oct 16 '15 at 02:30
  • @Jay . . . The question and this answer are specifically directed to SQL Server. What you say is true in almost every other database, which will materialize CTEs (such as Postgres) either always or sometimes (such as Oracle). – Gordon Linoff Oct 16 '15 at 22:14
  • @GordonLinoff whoops! Don't know how I missed the tags on this question. I'll leave my comment in case someone else makes the same mistake. – Jay Oct 16 '15 at 23:57
  • @GordonLinoff, Is there any performance improvement if we create indexes for Temp tables? – RGS Apr 08 '16 at 07:45
  • 2
    @RGS . . . Indexes on temporary tables definitely improve queries that can take advantage of those indexes -- as with indexes on a permanent table. But, if you materialize a subquery as a temporary table, you may lose the advantage of the indexes on the original tables. – Gordon Linoff Apr 09 '16 at 22:13
  • @GordonLinoff, I could not understand what you told in last point i.e., if you materialize a subquery as a temporary table. – RGS Apr 11 '16 at 04:48
  • 2
    @RGS . . .When a database engine materializes a subquery/CTE in the course of executing a complex query, it doesn't add indexes on the materialization. You can do this manually using temporary tables. – Gordon Linoff Apr 12 '16 at 02:22
  • One difference is that a CTE used more than once could be easily identified and calculated once :) yeah! but it runs one time in all places used?? – R.Akhlaghi Aug 29 '17 at 07:33
  • @rezaakhlaghi . . . In SQL Server, CTEs are treated as views and are run each time they are referenced. Most databases, however, do what you suggest. – Gordon Linoff Aug 30 '17 at 06:22
  • dear @gordon-linoff A way that was useful for me is : select repeatable data witch needed in sub queries once into a variable table (not #Temp Table :) and used it where I am need it. This reduced the cost of execution significantly :) As the friends said, CTE merely makes our code declaration more beautiful :) – R.Akhlaghi Sep 02 '17 at 11:43
  • Re. "CTE used more than once could be easily identified and calculated once" and the idea that Temp Table could be faster due to guaranteed Statistics: Seems that's only true if the sub-Query is independent on the rest of the main Query. With most sub-Queries (in On or Where Clauses) I've seen / had to write, the sub-Query is dependent on a particular Row's value(s) from another Table in the main Query. – Tom Sep 08 '17 at 15:36
  • Noone's mentioned effects on what our DBA calls "Contention" / "Deadlocks". He almost always creates Temp Tables mainly to avoid "Contention" / "Deadlocks" to avoid decreasing perf of *other* Queries (and only as a distant secondary reason to increase perf of the Query he's writing). He cites something he's read about how if a Query references too many Rows out of a Table, SQL Server just Locks the entire Table. I suspect he's violating YAGNI and KISS, as I've worked at shops w/ MUCH bigger / more heavily-used DB's and never needed Temp Tables if Queries properly written and Tables Indexed. – Tom Sep 08 '17 at 15:45
  • 4
    `"Unfortunately, SQL Server does not seem to take advantage of this basic optimization method"` - do you know whether it's still the case in year 2020? – MaxU - stand with Ukraine Apr 19 '20 at 15:16
102

There is no rule. I find CTEs more readable, and use them unless they exhibit some performance problem, in which case I investigate the actual problem rather than guess that the CTE is the problem and try to re-write it using a different approach. There is usually more to the issue than the way I chose to declaratively state my intentions with the query.

There are certainly cases when you can unravel CTEs or remove subqueries and replace them with a #temp table and reduce duration. This can be due to various things, such as stale stats, the inability to even get accurate stats (e.g. joining to a table-valued function), parallelism, or even the inability to generate an optimal plan because of the complexity of the query (in which case breaking it up may give the optimizer a fighting chance). But there are also cases where the I/O involved with creating a #temp table can outweigh the other performance aspects that may make a particular plan shape using a CTE less attractive.

Quite honestly, there are way too many variables to provide a "correct" answer to your question. There is no predictable way to know when a query may tip in favor of one approach or another - just know that, in theory, the same semantics for a CTE or a single subquery should execute the exact same. I think your question would be more valuable if you present some cases where this is not true - it may be that you have discovered a limitation in the optimizer (or discovered a known one), or it may be that your queries are not semantically equivalent or that one contains an element that thwarts optimization.

So I would suggest writing the query in a way that seems most natural to you, and only deviate when you discover an actual performance problem the optimizer is having. Personally I rank them CTE, then subquery, with #temp table being a last resort.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • 5
    +1 turning out to be quite a subjective question; I hope it doesn't get closed for being too vague as the answers thus far are informative. I realise :-) you don't like it when questions change but do you have any suggestions for narrowing the question in the OP? – whytheq Jun 23 '12 at 16:54
  • 3
    I think this question is fine, you'll notice there isn't a single vote-to-close yet, but if the answers start flailing around wildly it will probably get shut down. As I suggested in my answer, if you have a *particular* case where you see a big difference between a CTE and a subquery, start a new question with the actual queries and execution plans (and it might be a better fit on [dba.se](http://dba.stackexchange.com/)). Just realize that the answer to help with *that* query might not be the same answer for a different query with the same scenario. – Aaron Bertrand Jun 23 '12 at 16:59
  • Right under your question there are links `link / edit / close / flag` - if there have been any votes to close the question, you will see `close (n)` where `n` represents the number of users who have voted to close your question. If you click on the link you will see the reasons those users selected. – Aaron Bertrand Jun 23 '12 at 17:13
  • @whytheq also see [this recent blog post by Bob Beauchemin](http://www.sqlskills.com/BLOGS/BOBB/post/Does-everybody-get-that-(generalizing-esoteric-optimization-techniques).aspx). It doesn't treat CTE vs. subquery specifically but the same kind of concept applies: if you choose an unintuitive pattern for performance reasons, document the crap out of it and re-visit it to ensure that the quirk you discovered is still real. I might even suggest leaving the more natural version of the query commented out, unless you have a reliable source control system in place that holds the previous version. – Aaron Bertrand Jun 23 '12 at 18:41
  • 1
    Fixed link above: https://www.sqlskills.com/blogs/bobb/does-everybody-get-that-generalizing-esoteric-optimization-techniques/ – ADJenks Aug 08 '19 at 00:24
  • @adjenks Thanks, a lot can happen in 7 years. It's one of the unfortunate side effects of such valuable, long-lasting sites like this: links will eventually rot. – Aaron Bertrand Aug 08 '19 at 00:37
  • @AaronBertrand For references that I don't want to break, I use this: https://archive.org/web/ – ADJenks Aug 08 '19 at 01:24
  • @adjenks I do use that resource after links break, but I'm not going to pre-empt that by pulling a static link for every single URL I post, not just because it's a lot more work given that it's impossible to predict which links will break, but also since those that *don't* break are likely to get updated over time, while the static link to a specific archived copy won't reflect those changes. I update posts that have dead links (and have done so exhaustively for breaks like connect and sqlblog.com), but it's impossible to do for 6-year old comments. – Aaron Bertrand Aug 08 '19 at 01:28
  • @AaronBertrand Ah, I tend to want my exact reference to remain unchanged and believe that users can still see a newer version if they want to get it. If I don't archive it at the time I'm referencing it, then it might never get archived. To each their own. – ADJenks Aug 08 '19 at 01:54
  • @adjenks If you look at my profile on this and other sites, you might get a sense why I can't possibly archive every link at the time I'm referencing it, or take the time to prioritize which links are more likely to die and/or be more harmful if they do. Can you show some examples where you've done this pre-emptive archiving? – Aaron Bertrand Aug 08 '19 at 01:59
27

#temp is materalized and CTE is not.

CTE is just syntax so in theory it is just a subquery. It is executed. #temp is materialized. So an expensive CTE in a join that is execute many times may be better in a #temp. On the other side if it is an easy evaluation that is not executed but a few times then not worth the overhead of #temp.

The are some people on SO that don't like table variable but I like them as the are materialized and faster to create than #temp. There are times when the query optimizer does better with a #temp compared to a table variable.

The ability to create a PK on a #temp or table variable gives the query optimizer more information than a CTE (as you cannot declare a PK on a CTE).

Salman A
  • 262,204
  • 82
  • 430
  • 521
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • what is the acronym "TVP" ... something similar to #temp ? – whytheq Jun 25 '12 at 08:00
  • TVP is becoming a common term, because it sounds impressive (to some). In short, a TVP is a table passed as a parameter. Anyone who has used Table variables will be right at home with them. – WonderWorker Oct 22 '19 at 11:56
  • 5
    WARNING - TVPs do not have execution plans! Don't use TVPs for anything other the the simplest of short lookup lists. If you do any complex joins, inserts or updates on them, you could run into massive optimisation issues. Trust me, I've been burnt by this. – Riegardt Steyn Oct 29 '19 at 11:23
14

Just 2 things I think make it ALWAYS preferable to use a # Temp Table rather then a CTE are:

  1. You can not put a primary key on a CTE so the data being accessed by the CTE will have to traverse each one of the indexes in the CTE's tables rather then just accessing the PK or Index on the temp table.

  2. Because you can not add constraints, indexes and primary keys to a CTE they are more prone to bugs creeping in and bad data.


-onedaywhen yesterday

Here is an example where #table constraints can prevent bad data which is not the case in CTE's

DECLARE @BadData TABLE ( 
                       ThisID int
                     , ThatID int );
INSERT INTO @BadData
       ( ThisID
       , ThatID
       ) 
VALUES
       ( 1, 1 ),
       ( 1, 2 ),
       ( 2, 2 ),
       ( 1, 1 );

IF OBJECT_ID('tempdb..#This') IS NOT NULL
    DROP TABLE #This;
CREATE TABLE #This ( 
             ThisID int NOT NULL
           , ThatID int NOT NULL
                        UNIQUE(ThisID, ThatID) );
INSERT INTO #This
SELECT * FROM @BadData;
WITH This_CTE
     AS (SELECT *
           FROM @BadData)
     SELECT *
       FROM This_CTE;
whytheq
  • 34,466
  • 65
  • 172
  • 267
ShanksPranks
  • 397
  • 3
  • 8
  • 4
    `ALWAYS` is bit too far but thanks for answer. In terms of readability the use of CTEs can be a good thing. – whytheq Jun 27 '16 at 12:30
  • 3
    I don't understand your second point at all. The way I see it, the query defining the CTE is analogous to the constraints you would put on the temp table, noting that the former can comprise arbitrarily complex predicates whereas the latter is far more limited (e.g. `CHECK` constraint referring to multiple rows/tables is not allowed). Can you post an example where a CTE exhibits a bug that the temp table equivalent does not? – onedaywhen Jul 07 '16 at 11:56
  • I know this is an old post; but just wanted to point out that your example is not comparing a CTE with a #temp table; you are comparing a @table variable with a #temp table. – Malachy Sep 07 '20 at 16:07