74

I've been investigating making performance improvements on a series of procedures, and recently a colleague mentioned that he had achieved significant performance improvements when utilising an INNER JOIN in place of EXISTS.

As part of the investigation as to why this might be I thought I would ask the question here.

So:

  • Can an INNER JOIN offer better performance than EXISTS?
  • What circumstances would this happen?
  • How might I set up a test case as proof?
  • Do you have any useful links to further documentation?

And really, any other experience people can bring to bear on this question.

I would appreciate if any answers could address this question specifically without any suggestion of other possible performance improvements. We've had quite a degree of success already, and I was just interested in this one item.

Any help would be much appreciated.

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
James Wiseman
  • 29,946
  • 17
  • 95
  • 158
  • I'm currently looking at a stored procedure in SQL08 that makes heavy use of EXISTS, and wondering if INNER JOIN would be more efficient when working with multi-million row tables. – Ian Henderson Jul 23 '19 at 12:11
  • Would this answer have a more obvious winner if we say that the tables we are querying has millions of lines? (I'm thinking a JOIN would first load and join all rows with the other table, before applying anywhere clauses.) Exists will loop the left table's rows and check each against the exist condition, so could introduce a smaller memory need) – Wasted_Coder Apr 01 '22 at 06:36

3 Answers3

77

Generally speaking, INNER JOIN and EXISTS are different things.

The former returns duplicates and columns from both tables, the latter returns one record and, being a predicate, returns records from only one table.

If you do an inner join on a UNIQUE column, they exhibit same performance.

If you do an inner join on a recordset with DISTINCT applied (to get rid of the duplicates), EXISTS is usually faster.

IN and EXISTS clauses (with an equijoin correlation) usually employ one of the several SEMI JOIN algorithms which are usually more efficient than a DISTINCT on one of the tables.

See this article in my blog:

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • 3
    This is somewhat off topic, but I would suggest avoiding DISTINCT and use GROUP BY for overall better performance when returning distinct lists. DISTINCT doesn't perform as well as GROUP BY in general. It may help make up some of the difference between INNER JOIN and EXISTS as well. – EricI Mar 21 '13 at 20:51
  • 11
    @EricI: could you please provide an example of a query which is less efficient with `DISTINCT` than `GROUP BY`, provided that the outputs are identical? Thanks! – Quassnoi Mar 22 '13 at 09:54
16

Maybe, maybe not.

  • The same plan will be generated most likely
  • An INNER JOIN may require a DISTINCT to get the same output
  • EXISTS deals with NULL
gbn
  • 422,506
  • 82
  • 585
  • 676
1

In sql server 2019 queries with IN, EXIST, JOIN statements have different plans (if correct indexes added). So performence also is different. It is shown in article https://www.mssqltips.com/sqlservertip/6659/sql-exists-vs-in-vs-join-performance-comparison/ that JOIN is some faster.

P.S. I understand that question was about sql server 2005 (in tags), but people mostly looks for answer by article title.

Roma Ruzich
  • 692
  • 5
  • 18