0

I'm writing what I think should be a simple query using the "not in" operator, and it's not doing what I expect.

Background:

  • I have two tables, Contact and Company.
  • Contact includes columns ContactID (person's identity) and CompanyID (which company they work for)
  • CompanyID values are expected to be equivalent to the CompanyIDs in the Company table
  • I want to write a query that checks how many people from the Contact table that have an "invalid" CompanyID (i.e., listed as working for a Company that isn't in the Company table)

I have a working query that does this:

select
    count(ContactID)
from
    Contact left join Company on Contact.CompanyID = Company.CompanyID
where
    Company.CompanyID is null;

This query returns the value 2725538, which I believe to be the correct answer (I've done some simple "show me the top 10 rows" debugging, and it appears to be counting the right rows).


I wrote a second query which I expected to return the same result:

select
    count(ContactID)
from
    Contact
where
    CompanyID not in
        (select
            CompanyID
        from
            Company)

However, this query instead returns 0.


To help me debug this, I checked two additional queries.

First, I tried commenting out the WHERE clause, which should give me all of the ContactIDs, regardless of whether they work for an invalid company:

select
    count(ContactID)
from
    Contact

This query returns 29722995.

Second, I tried removing the NOT from my query, which should give me the inverse of what I'm looking for (i.e., it should count the Contacts who work for valid companies):

select
    count(ContactID)
from
    Contact
where
    CompanyID in
        (select
            CompanyID
        from
            Company)

This query returns 26997457.


Notably, these two numbers differ by exactly 2725538, the number returned by the first, working query. This is what I would expect if my second query was working. The total number of Contacts, minus the number of Contacts whose CompanyIDs are in the Company table, should equal the number of Contacts whose CompanyIDs are not in the Company table, shouldn't it?

So, why is the "not in" version of the query returning 0 instead of the correct answer?

A_S00
  • 225
  • 2
  • 15

1 Answers1

2

the only issue could be of NULL CompanyID. Not In doesn't work with NULLs because of non-comparability of NULL.

try the following:

select
    count(ContactID)
from
    Contact
where
    CompanyID not in
        (select
            ISNULL(CompanyID,'')
        from
            Company)

you can see the example in db<>fiddle here.

Please find more details HERE.

sacse
  • 3,634
  • 2
  • 15
  • 24
  • Thanks, this is on the right track, the problem is definitely NULL (if I replace my whole WHERE clause with "WHERE CompanyID IS NULL" I get the right answer), but the query in this answer is still returning 0. I think maybe the issue is that I have NULLs in my Contact table, not in my Company table? I'll do some more troubleshooting. – A_S00 Apr 15 '20 at 18:25
  • `select ... from ... where companyid is not null and not in (select ... where companyid is not null)` will eliminate nulls from both tables and will give you apples to apples comparison between queries. – zedfoxus Apr 15 '20 at 18:32
  • Got it working; I just had to apply ISNULL to the *first* instance of CompanyID, rather than the one inside the NOT IN clause (since the NULLs causing problems were the ones in the Contact table). Accepting this as it led me to the right answer. Thanks! – A_S00 Apr 15 '20 at 18:41
  • @A_S00 gr8, you figured it out. thanks for accepting the answer. – sacse Apr 16 '20 at 04:06