Why different results from "WHERE EXISTS()" and WHERE id IN

Question

I get more results with this query:

SELECT * FROM table1
WHERE NOT EXISTS (
        SELECT table2.id FROM table2
        WHERE table1.id = table2.id
)

Than from this query:

SELECT * FROM table1
WHERE table1.id NOT IN (
        SELECT table2.id FROM table2
)

Why is this? I thought they were identical? By the way, table2 has some NULL values in the id field.

score 4 · Accepted Answer · answered Jul 17 '21 at 10:17

They are not identical.

When the subquery returns NULL, then the outer query returns no results at all. The reason is that NULL could match any value -- it means "unknown value" not "missing value" semantically. And an unknown value could match anything. In actual fact, the condition returns NULL in this case, which is treated as false.

For this reason, I strongly, strongly recommend never using NOT IN with a subquery. Use NOT EXISTS.

If you do, then you can do a NULL check:

SELECT *
FROM table1
WHERE table1.id NOT IN (
        SELECT table2.id FROM table2 WHERE table2.id IS NOT NULL
                       );

However, in my experience, you'll leave out the NULL check at some crucial time, and think that there are no non-matches when there are.

Why different results from "WHERE EXISTS()" and WHERE id IN

1 Answers1