Understanding SQL IN clause with NULL values

Question

Assume the following tables:

CREATE TABLE X (x_name VARCHAR(100));
CREATE TABLE Y (y_name VARCHAR(100));
INSERT INTO X VALUES ('blue');
INSERT INTO X VALUES ('red');
INSERT INTO Y VALUES ('blue');

Resulting in:

+---------+        +---------+
| Table X |        | Table Y |
+---------+        +---------+
|  x_name |        |  y_name |
+---------+        +---------+
|  'blue' |        |  'blue' |
|   'red' |        +---------+
+---------+

The results of the following queries are as expected:

SELECT * FROM X WHERE x_name IN (SELECT y_name FROM Y); will return one row | 'blue' |.
SELECT * FROM X WHERE x_name NOT IN (SELECT y_name FROM Y); will return one row | 'red' |.

Let's insert NULL into table Y:

INSERT INTO Y VALUES (NULL);

The first query will return the same result (blue). However, the second query from above will return no rows. Why is this?

score 1 · Accepted Answer · answered Jun 10 '20 at 14:30

Don't use not in with subqueries. Period. Use not exists; it does what you want:

select x.*
from x
where not exists (select 1 from y where y.y_name = x.x_name);

The problem is this. When you have:

x_name in ('a', 'b', null)

SQL actually returns NULL, not false. However, NULL is treated the same as false in where clauses (and when clauses but not for check constraints). So, the row gets filtered out.

When you negate this, either as:

not x_name in ('a', 'b', null)
x_name not in ('a', 'b', null)

The results is not NULL which is also NULL and everything gets filtered out.

Alas. The simplest solution in my opinion is to get in the habit of using not exists.

Understanding SQL IN clause with NULL values

1 Answers1