1

I had a strange error where I did something like the following:

SELECT *
FROM table1
WHERE NAME COLLATE LATIN1_General_CI_AS NOT IN (select name FROM table2)

Name was a nvarchar(50) null. Table2 had null values in the name column. It would then match names that did not exist in table2 because of the null values. Meaning that this would return 0 results. And if I did:

SELECT *
FROM table1
WHERE NAME COLLATE LATIN1_General_CI_AS NOT IN (
  select name FROM table2 where name is not null
)

I would get some results.

One table had collation Danish_Norwegian_CI_AS and the other one latin1. Maybe the COLLATE is the source of the problem?

Can anyone explain why this happens? Maybe something with set theory?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Anders Rune Jensen
  • 3,758
  • 2
  • 42
  • 53

3 Answers3

1

NULL does not work as some folks would expect when used in a NOT IN or NOT EXISTS expression.

Basically, SQL Server can't say for sure that something isn't in a subset if the subset contains NULL since NULL is a state of having an unknown value.

I don't think collation even enters into it, the queries are different.

See a full explanation of NULL in NOT IN here.

Community
  • 1
  • 1
JNK
  • 63,321
  • 15
  • 122
  • 138
1

I think I would work around the issue by using

SELECT * 
  FROM table1 t1
 WHERE NOT EXISTS ( SELECT *
                      FROM table2 t2
                     WHERE (t1.NAME COLLATE LATIN1_General_CI_AS = t2.name)
                        OR (t1.NAME IS NULL AND t2.name IS NULL))

No fancy theory though, just practical experience =P

deroby
  • 5,902
  • 2
  • 19
  • 33
1

It has nothing to do with collation. Try set ansi_nulls off and the first query will work as you expected.

The reason is that when you try to compare something to NULL (e.g. a = NULL or a <> NULL) the result is UNKNOWN and the entire query fails. If you set ansi_nulls to off then the same comparison evaluates to TRUE or FALSE and the query "works".

In other words:

WHEN a NOT IN (1, 2, null)

means

WHEN a <> 1 AND a <> 2 AND a <> null

WHERE the last part a <> null evaluates to UNKNOWN and the query fails to return any rows.

lalibi
  • 3,057
  • 3
  • 33
  • 41