2

I have two tables:

  1. Foo with 24.000.000 rows
  2. Bar with 16 rows

I'm considering rewriting query

SELECT * FROM Foo as F
WHERE EXISTS (SELECT 1 FROM Bar as B WHERE B.Baz = F.Baz)

with this one

SELECT * FROM Foo
WHERE Baz IN (SELECT Baz FROM Bar)

Edit: A third option was suggested in the comments. I didn't consider joining because I don't need any columns from Bar

SELECT * FROM Foo as F
JOIN Bar as B on B.Baz = F.Baz

But after looking at the execution plans for both queries I couldn't spot the difference. Are these queries really equivalent? Which query is better?

What should I consider when deciding between EXISTS and IN. I was wondering if SQL Server is smart enough to execute the nested query once and store the result for comparison, or does it execute the nested query for each row?

hidarikani
  • 1,121
  • 1
  • 11
  • 25
  • 5
    How about `SELECT Foo.* FROM Foo JOIN Bar ON Bar.Baz=Foo.Baz`? – Joachim Isaksson Aug 27 '13 at 11:09
  • 3
    I would use `EXISTS` but not for performance reasons, read this: [Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?](http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join) I assume it is also true for `EXISTS/IN`. – Tim Schmelter Aug 27 '13 at 11:11
  • @TimSchmelter I was just looking for this link from a previous answer of yours. Lucky me you happened to drop in. – Radu Gheorghiu Aug 27 '13 at 11:13
  • 1
    @JoachimIsaksson In the real query I have 8 EXISTS statemens. Repacing them with joins slowed down the query several times – hidarikani Aug 27 '13 at 11:25
  • @TimSchmelter - `EXISTS` and `IN` get the same plan. – Martin Smith Aug 27 '13 at 11:54
  • 1
    @JoachimIsaksson - That is not the same semantics. `EXISTS` is a semi join not an inner join. The `JOIN` can bring back duplicates. – Martin Smith Aug 27 '13 at 11:55
  • @MartinSmith: as i've mentioned "not for performance reasons". From link above: "_The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values_" – Tim Schmelter Aug 27 '13 at 12:06
  • @TimSchmelter - I'm well aware of that. [My answer here looks at that in detail](http://stackoverflow.com/questions/173041/not-in-vs-not-exists/11074428#11074428) That only applies to the `NOT` versions. – Martin Smith Aug 27 '13 at 12:07
  • @MartinSmith Yes, `DISTINCT` is needed for the same result, I was more interested in whether the query plan with a join was different in any significant way. I suspect it's not. – Joachim Isaksson Aug 27 '13 at 12:54

2 Answers2

4

Either EXISTS or IN are fine.

Both should give you the same plan with a logical semi join operator (NULL does not change the semantics here unlike NOT IN/NOT EXISTS)

Replacing with an INNER JOIN could change the results except if Baz is guaranteed to be unique in Bar.

Without this constraint an inner join could bring back additional rows that you would then need to get rid of with DISTINCT.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
0

Wouldn't this be the same as well but less confusing? The scenario you describe lends itself to an inner join.

SELECT F.* FROM Foo as F inner join Bar as B on F.Baz=B.Baz
jle
  • 9,316
  • 5
  • 48
  • 67