I have a case where using a JOIN or an IN will give me the correct results... Which typically has better performance and why? How much does it depend on what database server you are running? (FYI I am using MSSQL)
-
Sorry for the possible dupe... didn't find that question when I was searching – DigitalZebra Jul 29 '09 at 13:50
-
:) I was actually looking for a different article I used when I researched into something similar a while ago, and stumbled across that one by mistake – AdaTheDev Jul 29 '09 at 13:48
-
Note that this question directly asks about the difference in performance (which depends on optimization). However, JOIN and IN have different semantics! As SQL is a declarative language, I recommend using the way that fits your use case, relying on the underlying query optimizer (as long as possible). Thus, if possible, first check which way would be more suited (probably IN or even EXISTS in this case). – P. Rathje Apr 06 '21 at 22:15
6 Answers
Generally speaking, IN
and JOIN
are different queries that can yield different results.
SELECT a.*
FROM a
JOIN b
ON a.col = b.col
is not the same as
SELECT a.*
FROM a
WHERE col IN
(
SELECT col
FROM b
)
, unless b.col
is unique.
However, this is the synonym for the first query:
SELECT a.*
FROM a
JOIN (
SELECT DISTINCT col
FROM b
)
ON b.col = a.col
If the joining column is UNIQUE
and marked as such, both these queries yield the same plan in SQL Server
.
If it's not, then IN
is faster than JOIN
on DISTINCT
.
See this article in my blog for performance details:

- 413,100
- 91
- 616
- 614
-
Yeah it makes sense that they would execute the same if the joining column is unique (which it is in my case) – DigitalZebra Jul 29 '09 at 13:48
-
1On a similar note, should I use IN(SELECT DISTINCT ...) or simply IN(SELECT ...)? – moo Jul 29 '09 at 13:53
-
10@orlandu63: `IN` implies `DISTINCT`. `SQL Server` is smart enough to notice it, and will generate same plans for both queries. Not sure, though, how other `RDBMS`'s will behave. – Quassnoi Jul 29 '09 at 14:05
-
1>>IN and JOIN are different queries that can yield different results. Can you please explain why would it generate different result in this case even if b.col is not unique? – Abhijeet Jul 10 '15 at 08:42
-
1https://explainextended.com/2009/06/16/in-vs-join-vs-exists/ Really helps me.. Thank you.. – Abbas Galiyakotwala Jul 15 '16 at 09:21
That's rather hard to say - in order to really find out which one works better, you'd need to actually profile the execution times.
As a general rule of thumb, I think if you have indices on your foreign key columns, and if you're using only (or mostly) INNER JOIN conditions, then the JOIN will be slightly faster.
But as soon as you start using OUTER JOIN, or if you're lacking foreign key indexes, the IN might be quicker.
Marc

- 732,580
- 175
- 1,330
- 1,459
-
I was thinking this too... because it seems JOIN is a more common case and would more likely be optimized – DigitalZebra Jul 29 '09 at 13:49
This Thread is pretty old but still mentioned often. For my personal taste it is a bit incomplete, because there is another way to ask the database with the EXISTS keyword which I found to be faster more often than not.
So if you are only interested in values from table a you can use this query:
SELECT a.*
FROM a
WHERE EXISTS (
SELECT *
FROM b
WHERE b.col = a.col
)
The difference might be huge if col is not indexed, because the db does not have to find all records in b which have the same value in col, it only has to find the very first one. If there is no index on b.col and a lot of records in b a table scan might be the consequence. With IN or a JOIN this would be a full table scan, with EXISTS this would be only a partial table scan (until the first matching record is found).
If there a lots of records in b which have the same col value you will also waste a lot of memory for reading all these records into a temporary space just to find that your condition is satisfied. With exists this can be usually avoided.
I have often found EXISTS faster then IN even if there is an index. It depends on the database system (the optimizer), the data and last not least on the type of index which is used.

- 309
- 2
- 6
-
5On MSSql the fact that exists is better than a IN seems not true. For more information: http://explainextended.com/2009/06/16/in-vs-join-vs-exists/ On here you can read that: "Many think that EXISTS is more efficient that IN, because EXISTS returns only one row. This is not true for SQL Server. As we can see from the examples above, EXISTS and IN produces exactly same plans. This is because EXISTS is more flexible than IN. An IN can always be rewritten as EXISTS (using a simple WHERE condition with an equijoin) but not vice versa." – Micaël Félix Mar 10 '14 at 08:15
A interesting writeup on the logical differences: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I am pretty sure that assuming that the relations and indexes are maintained a Join will perform better overall (more effort goes into working with that operation then others). If you think about it conceptually then its the difference between 2 queries and 1 query.
You need to hook it up to the Query Analyzer and try it and see the difference. Also look at the Query Execution Plan and try to minimize steps.

- 2,831
- 1
- 24
- 28
Each database's implementation but you can probably guess that they all solve common problems in more or less the same way. If you are using MSSQL have a look at the execution plan that is generated. You can do this by turning on the profiler and executions plans. This will give you a text version when you run the command.
I am not sure what version of MSSQL you are using but you can get a graphical one in SQL Server 2000 in the query analyzer. I am sure that this functionality is lurking some where in SQL Server Studio Manager in later versions.
Have a look at the exeuction plan. As far as possible avoid table scans unless of course your table is small in which case a table scan is faster than using an index. Read up on the different join operations that each different scenario produces.

- 13,110
- 20
- 78
- 138
The optimizer should be smart enough to give you the same result either way for normal queries. Check the execution plan and they should give you the same thing. If they don't, I would normally consider the JOIN to be faster. All systems are different, though, so you should profile the code on your system to be sure.

- 399,467
- 113
- 570
- 794