28

I have a table (MainTable) with a bit over 600,000 records. It joins onto itself via a 2nd table (JoinTable) in a parent/child type relationship:

SELECT   Child.ID, Parent.ID
FROM     MainTable
AS       Child
JOIN     JoinTable
      ON Child.ID = JoinTable.ID
JOIN     MainTable
AS       Parent
      ON Parent.ID = JoinTable.ParentID
     AND Parent.SomeOtherData = Child.SomeOtherData

I know that every child record has a parent record and the data in JoinTable is acurate.

When I run this query it takes literally minutes to run. However if I join to Parent using a Left Join then it takes < 1 second to run:

SELECT   Child.ID, Parent.ID
FROM     MainTable
AS       Child
JOIN     JoinTable
      ON Child.ID = JoinTable.ID
LEFT JOIN MainTable
AS       Parent
      ON Parent.ID = JoinTable.ParentID
     AND Parent.SomeOtherData = Child.SomeOtherData
WHERE    ...[some info to make sure we don't select parent records in the child dataset]...

I understand the difference in the results between an INNER JOIN and a LEFT JOIN. In this case it is returning exactly the same result as every child has a parent. If I let both queries run, I can compare the datasets and they are exactly the same.

Why is it that a LEFT JOIN runs so much faster than an INNER JOIN?


UPDATE Checked the query plans and when using an inner join it starts with the Parent dataset. When doing a left join it starts with the child dataset.

The indexes it uses are all the same.

Can I force it to always start with the child? Using a left join works, it just feels wrong.


Similar questions have been asked here before, but none seem to answer my question.

e.g. the selected answer in INNER JOIN vs LEFT JOIN performance in SQL Server says that Left Joins are always slower than Inner joins. The argument makes sense, but it's not what I'm seeing.

Community
  • 1
  • 1
Greg
  • 3,442
  • 3
  • 29
  • 50

2 Answers2

17

The Left join seems to be faster because SQL is forced to do the smaller select first and then join to this smaller set of records. For some reason the optimiser doesn't want to do this naturally.

3 ways to force the joins to happen in the right order:

  1. Select the first subset of data into a temporary table (or table variable) then join on it
  2. Use left joins (and remember that this could return different data because it's a left join not an inner join)
  3. use the FORCE ORDER keyword. Note that if table sizes or schemas change then the query plan may not be correct (see https://dba.stackexchange.com/questions/45388/forcing-join-order)
Community
  • 1
  • 1
Greg
  • 3,442
  • 3
  • 29
  • 50
  • 1
    I know this is kinda late, but you might want to make sure that your database statistics are up-to-date. If the query optimizer doesn't know about the relative table sizes, and value distributions in the join columns, it can make a *bad* query plan decision (SQL Server's has the most consistently good query optimizer of all the databases I work with). Here's a blog entry on the topic: http://blogs.msdn.com/b/buckwoody/archive/2009/08/18/sql-server-best-practices-auto-create-and-auto-update-statistics-should-be-on-most-of-the-time.aspx – Curt Jul 24 '13 at 17:13
  • Thanks for the idea. we had checked stats and they were up to date – Greg Jul 24 '13 at 21:07
  • 1
    I know its late but might help someone else. The only point I disagree with is the Table variable suggestion. Table variables always return an estimated row count of 1 regardless of how many rows are in the table. This can skew the plan tremendously. Read this http://blogs.msdn.com/b/psssql/archive/2014/08/11/if-you-have-queries-that-use-table-variables-sql-server-2012-sp2-can-help.aspx however, there is a traceflag in 2012 SP2 that could help – Eugene Niemand Oct 15 '14 at 12:34
  • Good point about the table variable. For us it worked out ok, but ymmv – Greg Jan 19 '15 at 01:40
  • A `LEFT JOIN` is absolutely not faster than an `INNER JOIN`. In fact, it's slower; by definition, an outer join (`LEFT JOIN` or `RIGHT JOIN`) has to do all the work of an `INNER JOIN` plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set. – Shafizadeh Aug 08 '15 at 12:02
  • @Sajad I understand what you're saying, however as I mentioned in this answer, the order of the joins changed in the query plan which took the query run time from minutes to sub-second. Left join means it does it after selecting the inner joins and (where possible) filtering – Greg Aug 08 '15 at 20:18
  • Your last sentence is weird for me, I don't know, maybe you are right, But I think `Inner join` means it does it after selecting the `left join` and *(where possible)* filtering. – Shafizadeh Aug 08 '15 at 20:25
  • "It is theoretically not faster than an INNER JOIN. It is NOT "absolutely not faster." https://stackoverflow.com/questions/2726657/inner-join-vs-left-join-performance-in-sql-server#comment26131412_2726683 – AjV Jsy Mar 30 '23 at 15:29
2

Try this one. Same result, different approach:

SELECT c.ID, p.ID 
FROM
(SELECT   Child.ID, JoinTable.ParentID
FROM     MainTable
AS       Child
JOIN     JoinTable
      ON Child.ID = JoinTable.ID) AS c
INNER JOIN 
(SELECT   Parent.ID, JoinTable.ID
FROM     MainTable
AS       Parent
JOIN     JoinTable
      ON Parent.ID = JoinTable.ParentID
     AND Parent.SomeOtherData = Child.SomeOtherData) AS p
ON c.ParentID = p.ID

If it does not help, use cte:

;WITH cte AS
(SELECT   Child.ID, JoinTable.ParentID
FROM     MainTable
AS       Child
JOIN     JoinTable
      ON Child.ID = JoinTable.ID)
SELECT cte.ID, Parent.ID
FROM cte INNER JOIN 
MainTable
AS       Parent
      ON Parent.ID = cte.ParentID
     AND Parent.SomeOtherData = cte.SomeOtherData
cha
  • 10,301
  • 1
  • 18
  • 26
  • CTE didn't seem to help, but forcing it into a table variable did. If no other answers come up then I'll run with that. – Greg Jun 14 '13 at 05:20