I'm writing what I think should be a simple query using the "not in" operator, and it's not doing what I expect.
Background:
- I have two tables, Contact and Company.
- Contact includes columns ContactID (person's identity) and CompanyID (which company they work for)
- CompanyID values are expected to be equivalent to the CompanyIDs in the Company table
- I want to write a query that checks how many people from the Contact table that have an "invalid" CompanyID (i.e., listed as working for a Company that isn't in the Company table)
I have a working query that does this:
select
count(ContactID)
from
Contact left join Company on Contact.CompanyID = Company.CompanyID
where
Company.CompanyID is null;
This query returns the value 2725538, which I believe to be the correct answer (I've done some simple "show me the top 10 rows" debugging, and it appears to be counting the right rows).
I wrote a second query which I expected to return the same result:
select
count(ContactID)
from
Contact
where
CompanyID not in
(select
CompanyID
from
Company)
However, this query instead returns 0.
To help me debug this, I checked two additional queries.
First, I tried commenting out the WHERE clause, which should give me all of the ContactIDs, regardless of whether they work for an invalid company:
select
count(ContactID)
from
Contact
This query returns 29722995.
Second, I tried removing the NOT from my query, which should give me the inverse of what I'm looking for (i.e., it should count the Contacts who work for valid companies):
select
count(ContactID)
from
Contact
where
CompanyID in
(select
CompanyID
from
Company)
This query returns 26997457.
Notably, these two numbers differ by exactly 2725538, the number returned by the first, working query. This is what I would expect if my second query was working. The total number of Contacts, minus the number of Contacts whose CompanyIDs are in the Company table, should equal the number of Contacts whose CompanyIDs are not in the Company table, shouldn't it?
So, why is the "not in" version of the query returning 0 instead of the correct answer?