0

Is it a good idea to put data into temp table first before joining several other tables?

For instance, let's say I have the following:

  • tableA, 5 million rows
  • tableB, 5 million rows
  • tableC, 5 million rows
    ...
  • tableG

The Query I want to perform may look like:

SELECT 1 FROM tableA 
INNER JOIN tableB WITH (NOLOCK) ON tableA.col1= tableB.col1
LEFT JOIN tableC WITH (NOLOCK) ON ...
...
LEFT JOIN tableG WITH (NOLOCK) ON ...
WHERE tableA.someCol= conditionA AND tableB.someCol= conditionB...

Assuming with the filter, only a small subset of tableA will be returned. Would it be a good idea to pull data from tableA first before joining other tables, so as to avoid blocking and may be increase performance?

I tried googling but couldn't find any satisfactory answer. Thanks in advance.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
c 2
  • 1,127
  • 3
  • 13
  • 21
  • Why are you using `WITH (NOLOCK)`? – ypercubeᵀᴹ Jun 05 '13 at 19:11
  • To avoid unnecessary blocking I guess? – c 2 Jun 05 '13 at 19:14
  • If you don't mind dirty reads, ok. Read this if you do mind: [SQL - when should you use “with (nolock)”](http://stackoverflow.com/questions/686724/sql-when-should-you-use-with-nolock) – ypercubeᵀᴹ Jun 05 '13 at 19:19
  • Honestly for something like that I usually try it both ways and see which works best. You might try moving the subset of data to a "permanent temporary" table (create a physical table that you plan to delete later) and then index it etc before running your query. – Kenneth Fisher Jun 05 '13 at 19:25
  • I agree with Kenneth. I usually do it without the temp table approach and only go that route if I'm having performance problems. – liebs19 Jun 05 '13 at 19:29
  • I see. so it sounds more like trial and error. But wouldn't sql server compiler be intelligent enough to optimize it? – c 2 Jun 05 '13 at 20:10
  • I was reading this post which seems a bit relevant: http://stackoverflow.com/questions/5496870/sql-filtering-large-tables-with-joins-best-practices – c 2 Jun 05 '13 at 20:10

1 Answers1

1

Here are the "typicals" that I try. I usually try them out and see what happens under load and under "big data" that represents production row numbers, not dev row numbers.

Going from memory.

  1. If it is "one time" use, I try to use the derived table method.
  2. If it data in the "holder" table can be reused, I start with a @variableTable if the number of rows will be small. 2.b. The only time I've seen a @variableTable screw you is if you do some aggregate results...where the "summary rows" are only a few, but to generate the summary rows, you hit a large amount of rows. Think something like "Select StateAbbreviation, count(*) from dbo.LargeTableOfData".....there will only be 50 or so rows in the result table, BUT the aggregate data comes from a large table with lots of rows.
  3. Then I to go a #TempTable. Most times without an index. Sometimes with an index. 2 or 3 times in my life the index on the #TempTable resulted in significant improvement.

It is a "try it out game". Sometimes you just don't know until you give it the ole college try.

Use Northwind
GO

/* Temp Table , No Index(es) */

IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
        drop table #TempTableNoIndex
end


CREATE TABLE #TempTableNoIndex 
( 
    OrderID int
)


Insert into #TempTableNoIndex (OrderID) select top 5 OrderID from dbo.Orders


Select * from dbo.[Order Details] od where exists (select null from #TempTableNoIndex innerHolder where innerHolder.OrderID = od.OrderID )



/* Temp Table , With Index(es) */

IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
        drop table #TempTableWithIndex
end


CREATE TABLE #TempTableWithIndex 
( 
    OrderID int
)


CREATE INDEX IX_TEMPTABLE_TempTableWithIndex_OrderID ON #TempTableWithIndex (OrderID)


Insert into #TempTableWithIndex (OrderID) select top 5 OrderID from dbo.Orders


Select * from dbo.[Order Details] od where exists (select null from #TempTableWithIndex innerHolder where innerHolder.OrderID = od.OrderID )



/* Variable Table */


Declare @HolderTable TABLE (    OrderID int   )


Insert into @HolderTable (OrderID) select top 5 OrderID from dbo.Orders


Select * from dbo.[Order Details] od where exists (select null from @HolderTable innerHolder where innerHolder.OrderID = od.OrderID )


/* Derived Table */


Select * from dbo.[Order Details] od
join
( select top 5 OrderID from dbo.Orders ) as derived1
on od.OrderID = derived1.OrderID



/* Clean up */

IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
        drop table #TempTableNoIndex
end


IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
        drop table #TempTableWithIndex
end
granadaCoder
  • 26,328
  • 10
  • 113
  • 146
  • Thanks for the response. But in this case, wouldn't sql server be intelligent enough to optimize the query? – c 2 Jun 05 '13 at 20:13
  • From experience...it just isn't that simple. There are varying ways that the #tempDb can be setup, that can affect things. Then there is http://msdn.microsoft.com/en-us/library/ms181055(v=sql.105).aspx Execution Plan Caching . Research "option recompile" for another caveat. And then there is stuff that I'm not smart enough to know about. "A little trial and error" is needed, and I think the other dude said the same thing. If it was super easy, then anybody could be a dba. – granadaCoder Jun 05 '13 at 20:18
  • http://blogs.msdn.com/b/sqlcat/archive/2008/10/09/table-variable-vs-temp-table-and-estimated-vs-actual-query-plan.aspx – granadaCoder Jun 05 '13 at 20:18