Does using WHERE IN hurt query performance?

Question

I've heard that using an IN Clause can hurt performance because it doesn't use Indexes properly. See example below:

SELECT ID, Name, Address 
FROM people
WHERE id IN (SELECT ParsedValue FROM UDF_ParseListToTable(@IDList))

Is it better to use the form below to get these results?

SELECT ID,Name,Address
FROM People as p
INNER JOIN UDF_ParseListToTable(@IDList) as ids
ON p.ID = ids.ParsedValue

Does this depend on which version of SQL Server you are using? If so which ones are affected?

Similar to http://stackoverflow.com/questions/761150/how-does-the-in-predicate-work-in-sql — Gavin Miller, Jan 27 '10 at 20:59

score 5 · Accepted Answer · edited Jul 18 '17 at 13:38

5

Yes, assuming relatively large data sets.

It's considered better to use EXISTS for large data sets. I follow this and have noticed improvements in my code execution time.

According to the article, it has to do with how the IN vs. EXISTS is internalized. Another article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

edited Jul 18 '17 at 13:38

Devasayal

40
8

answered Jan 27 '10 at 21:08

Frank V

25,141
34
106
144

2

Exists using a JOIN operator, where IN uses OR operators. – mrdenny Jan 27 '10 at 21:47
This answer is simply wrong. The first link implies that the query plan for a given query depends only on the query - at least when it comes to `in` and `exists` queries. That's completely untrue; the existance of things like "statistics" and "parameter sniffing" should be enough to see this idea doesn't match reality. The second article is fine, but it's completely irrelevant - it's not talking about performance at all, it's talking about the confusing semantics of `not in` with a null-containing set. – Eamon Nerbonne Jan 11 '15 at 14:42
To clarify: there certainly will be cases where an `IN` clause is suboptimal, but it's not a general trend; and the notion that it doesn't use indexes properly isn't backed up by anything this answer says or links to. – Eamon Nerbonne Jan 11 '15 at 17:23

score 2 · Answer 2 · answered Jan 27 '10 at 23:06

It's very simple to find out - open Management studio, put both versions of the query in, then run with the Show Execution plan turned on. Compare the two execution plans. Often, but not always, the query optimizer will make the same exact plan / literally do the same thing for different versions of a query that are logically equivalent.

In fact, that's its purpose - the goal is that the optimizer would take ANY version of a query, assuming the logic is the same, and make an optimal plan. Alas, the process isn't perfect.

Here's one scientific comparison:

http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

Raj More · Answer 3 · 2010-01-27T21:05:19.017

More than the IN or the Table Variable, I would think that proper use of an Index would increase the performance of your query.

Also, from the table name, it does not seem like you are going to have a lot of entries in it so which way you go may be moot point in this particular example.

Secondly, IN will be evaluated only once since there is no subquery. In your case, the @IDList variable is probably going to cause mistmatches you will need @IDList1, @IDList2, @IdList3.... because IN demands a list.

As a general rule of thumb, you should avoid IN with subqueries and use EXISTS with a join - you will get better performance more often than not.

score 0 · Answer 4 · answered Jan 27 '10 at 21:02

IN can hurt performance because SQL Server must generate a complete result set and then create potentially a huge IF statement, depending on the number of rows in the result set. BTW, calling a UDF can be a real performance hit as well. They are very nice to use but can really impact performance, if you are not careful. You can Google UDF and Performance to do some research on this.

score 0 · Answer 5 · answered Jan 27 '10 at 21:27

0

Your first example is not the same as your second example, because WHERE X IN (@variable) is the same as WHERE X = @variable (i.e. you cannot have variable lists).

Regarding performance, you'll have to look at the execution plans to see what indexes are chosen.

answered Jan 27 '10 at 21:27

Cade Roux

88,164
40
182
265

I looked at my original query and think I see what you're talking about. Take a look at my update – Abe Miessler Jan 27 '10 at 21:44
@AbeMiessler There is still a semantic difference between your versions. If there are duplicates in the list, it will cause a cross join effect in the second. So this should theoretically have a small impact on the execution plan, but probably not significant if the list is small (the first case plan would probably be the same as the second with an added aggregate to get a distinct list). My money would be on nearly similar execution plans. – Cade Roux Jan 28 '10 at 14:42

Does using WHERE IN hurt query performance?

5 Answers5