SQL "select where not in subquery" returns no results

Question

Disclaimer: I have figured out the problem (I think), but I wanted to add this issue to Stack Overflow since I couldn't (easily) find it anywhere. Also, someone might have a better answer than I do.

I have a database where one table "Common" is referenced by several other tables. I wanted to see what records in the Common table were orphaned (i.e., had no references from any of the other tables).

I ran this query:

select *
from Common
where common_id not in (select common_id from Table1)
and common_id not in (select common_id from Table2)

I know that there are orphaned records, but no records were returned. Why not?

(This is SQL Server, if it matters.)

This https://stackoverflow.com/a/129152/1667619 answers the WHY question pretty well. — Ruchan, Jul 12 '17 at 06:34

Quassnoi · Accepted Answer · 2009-09-19T21:46:09.467

Update:

These articles in my blog describe the differences between the methods in more detail:

There are three ways to do such a query:

LEFT JOIN / IS NULL:

SELECT  *
FROM    common
LEFT JOIN
        table1 t1
ON      t1.common_id = common.common_id
WHERE   t1.common_id IS NULL

NOT EXISTS:

SELECT  *
FROM    common
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    table1 t1
        WHERE   t1.common_id = common.common_id
        )

NOT IN:

SELECT  *
FROM    common
WHERE   common_id NOT IN
        (
        SELECT  common_id
        FROM    table1 t1
        )

When table1.common_id is not nullable, all these queries are semantically the same.

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN) return NULL when a value does not match anything in a list containing a NULL.

This may be confusing but may become more obvious if we recall the alternate syntax for this:

common_id = ANY
(
SELECT  common_id
FROM    table1 t1
)

The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL value yields the NULL result which renders the whole result NULL too.

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL.

Suppose we have these data:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3, NOT IN will return nothing (since it will always evaluate to either FALSE or NULL).

In MySQL, in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS. If the column is nullable, NOT EXISTS is the most efficient (again, not much).

In Oracle, all three queries yield same plans (an ANTI JOIN).

In SQL Server, NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.

In PostgreSQL, LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN, sine they are optimized to an Anti Join, while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)

+1 because, four and a half years on, this answer helped me out with a problem that had me stumped! — Carson63000, Mar 26 '14 at 04:58
@Carson63000 Snap! I thought I was going mad before I saw this answer — Bobby, Jun 04 '14 at 07:49
What does the `NOT EXISTS ` part do in the second solution? What purpose does the `SELECT NULL` part serve ? — Istiaque Ahmed, Nov 10 '17 at 12:06
@IstiaqueAhmed: `NOT EXISTS` evaluates to TRUE if the query inside it returns any rows. `SELECT NULL` could as well be `SELECT *` or `SELECT 1` or anything else, the `NOT EXISTS` predicate does not look at the rows' values, only counts them. — Quassnoi, Nov 10 '17 at 13:04
In typical cases, the first alternative (the LEFT | RIGHT JOIN / IS NULL), is an exclusive join, and SQL databases are designed to handle this request with very high performance. — alejandrob, Mar 04 '22 at 19:56

score 49 · Answer 2 · answered Sep 10 '09 at 17:29

49

If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.

Don't write IN clauses that allow nulls in the list side. Filter them out!

common_id not in
(
  select common_id from Table1
  where common_id is not null
)

answered Sep 10 '09 at 17:29

Amy B

108,202
21
135
185

8

nulls in the in-clause-list are a common reason for missing query results. – Amy B Sep 11 '09 at 13:01
' When comparing with a null, the answer is unknown ' - from the answer by @Jeremy Stein. From `common_id not in`, we can still have `common_id` value that is `NULL`. So doesn't the problem of getting no results still persist ? – Istiaque Ahmed Nov 10 '17 at 11:10

score 12 · Answer 3 · edited Nov 05 '22 at 16:49

The short answer:

There is a NULL within the collection returned by your subquery. You can solve the problem by removing that NULL value before finishing the subquery or to use NOT EXISTS predicate instead of NOT IN, as it does it implicitly.

The long answer (From T-SQL Fundamentals, Third edition, by Itzik Ben-Gan)

This is an example: Imagine there is a order with a NULL orderid inside Sales.Orders table, so the subquery returns some integers, and a NULL value.

SELECT custid, companyname
FROM Sales.Customers
WHERE custid NOT IN(SELECT O.custid
             FROM Sales.Orders AS O);

The explanation on why the query from above returns an empty set:

Obviously, the culprit here is the NULL customer ID you added to the Orders table. The NULL is one of the elements returned by the subquery. Let’s start with the part that does behave like you expect it to. The IN predicate returns TRUE for a customer who placed orders (for example, customer 85), because such a customer is returned by the subquery. The NOT operator negates the IN predicate; hence, the NOT TRUE becomes FALSE, and the customer is discarded. The expected behavior here is that if a customer ID is known to appear in the Orders table, you know with certainty that you do not want to return it.

However (take a deep breath), if a customer ID from Customers doesn’t appear in the set of non-NULL customer IDs in Orders, and there’s also a NULL customer ID in Orders, you can’t tell with certainty that the customer is there—and similarly you can’t tell with certainty that it’s not there. Confused? I hope I can clarify this explanation with an example.

The IN predicate returns UNKNOWN for a customer such as 22 that does not appear in the set of known customer IDs in Orders. That’s because when you compare it with known customer IDs you get FALSE, and when you compare it with a NULL you get UNKNOWN. FALSE OR UNKNOWN yields UNKNOWN. Consider the expression 22 NOT IN (1, 2, <other non-22 values>, NULL). This expression can be rephrased as NOT 22 IN (1, 2, …, NULL). You can expand this expression to NOT (22 = 1 OR 22 = 2 OR … OR 22 = NULL). Evaluate each individual expression in the parentheses to its truth value and you get NOT (FALSE OR FALSE OR … OR UNKNOWN), which translates to NOT UNKNOWN, which evaluates to UNKNOWN.

The logical meaning of UNKNOWN here, before you apply the NOT operator, is that it can’t be determined whether the customer ID appears in the set, because the NULL could represent that customer ID. The tricky part here is that negating the UNKNOWN with the NOT operator still yields UNKNOWN. This means that in a case where it is unknown whether a customer ID appears in a set, it is also unknown whether it doesn’t appear in the set. Remember that a query filter discards rows that get UNKNOWN in the result of the predicate.

In short, when you use the NOT IN predicate against a subquery that returns at least one NULL, the query always returns an empty set. So, what practices can you follow to avoid such trouble? First, when a column is not supposed to allow NULLs, be sure to define it as NOT NULL. Second, in all queries you write, you should consider NULLs and the three-valued logic. Think explicitly about whether the query might process NULLs, and if so, whether SQL’s treatment of NULLs is correct for you. When it isn’t, you need to intervene. For example, our query returns an empty set because of the comparison with the NULL. If you want to check whether a customer ID appears only in the set of known values, you should exclude the NULLs—either explicitly or implicitly. To exclude them explicitly, add the predicate O.custid IS NOT NULL to the subquery, like this:

SELECT custid, companyname
FROM Sales.Customers
WHERE custid NOT IN(SELECT O.custid
                    FROM Sales.Orders AS O
                    WHERE O.custid IS NOT NULL);

You can also exclude the NULLs implicitly by using the NOT EXISTS predicate instead of NOT IN, like this:

SELECT custid, companyname
FROM Sales.Customers AS C
WHERE NOT EXISTS
   (SELECT *
    FROM Sales.Orders AS O
    WHERE O.custid = C.custid);

Recall that unlike IN, EXISTS uses two-valued predicate logic. EXISTS always returns TRUE or FALSE and never UNKNOWN. When the subquery stumbles into a NULL in O.custid, the expression evaluates to UNKNOWN and the row is filtered out. As far as the EXISTS predicate is concerned, the NULL cases are eliminated naturally, as though they weren’t there. So EXISTS ends up handling only known customer IDs. Therefore, it’s safer to use NOT EXISTS than NOT IN.

The information above is taken from Chapter 4 - Subqueries, T-SQL Fundamentals, Third edition

score 7 · Answer 4 · answered Sep 10 '09 at 16:08

7

Table1 or Table2 has some null values for common_id. Use this query instead:

select *
from Common
where common_id not in (select common_id from Table1 where common_id is not null)
and common_id not in (select common_id from Table2 where common_id is not null)

answered Sep 10 '09 at 16:08

Jeremy Stein

19,171
16
68
83

1

What if there's data in one table but not the other? Do you want "and" or "or" there? – Philip Kelley Sep 10 '09 at 16:30
1

I'm looking for records not referenced in any table, so I want AND. I'll clarify the question. – Jeremy Stein Sep 10 '09 at 16:35

Austin Salonen · Answer 5 · 2009-09-10T18:31:33.783

4

Just off the top of my head...

select c.commonID, t1.commonID, t2.commonID
from Common c
     left outer join Table1 t1 on t1.commonID = c.commonID
     left outer join Table2 t2 on t2.commonID = c.commonID
where t1.commonID is null 
     and t2.commonID is null

I ran a few tests and here were my results w.r.t. @patmortech's answer and @rexem's comments.

If either Table1 or Table2 is not indexed on commonID, you get a table scan but @patmortech's query is still twice as fast (for a 100K row master table).

If neither are indexed on commonID, you get two table scans and the difference is negligible.

If both are indexed on commonID, the "not exists" query runs in 1/3 the time.

edited Sep 10 '09 at 18:31

answered Sep 10 '09 at 16:20

Austin Salonen

49,173
15
109
139

1

That should be an AND in the where clause. Otherwise, that works. – Jeremy Stein Sep 10 '09 at 16:33
1

changed per your comment. The "or" picks out orphans in either table. – Austin Salonen Sep 10 '09 at 16:35
1

That's better. By the way, is there some reason I should use outer joins rather than the subquery? – Jeremy Stein Sep 10 '09 at 16:37
3

Readability is primary one. I suspect a better execution plan would be generated but without a query plan, I can't confirm. – Austin Salonen Sep 10 '09 at 16:57
2

This approach is worse that using NOT EXISTS - the join results in fetching more rows than it needs, then the results compared for the columns being null. And NOT EXISTS is more readable to boot. – OMG Ponies Sep 10 '09 at 17:29
@rexem - you can't make that claim without an execution plan. – Amy B Sep 10 '09 at 17:31
: Think about what the query has to do to produce the results. It's OK to think for yourself than to hide behind the execution plan defense. An execution plan is only as good as the data model. – OMG Ponies Sep 10 '09 at 17:36
@rexem -- you've argued a point that should now be an answer. – Austin Salonen Sep 10 '09 at 17:44
@Austin: That doesn't stop hack answers like these from perpetuating themselves. Don't get me wrong - your answer works, but it's in the realm of putting SELECTs in the SELECT clause. – OMG Ponies Sep 10 '09 at 17:49
@rexem -- point conceded. But you could add value to the question by posting a well-explained response. – Austin Salonen Sep 10 '09 at 17:51
I tried both ways and, for my data, the subqueries were 5-10% faster. – Jeremy Stein Sep 10 '09 at 17:52
@Austin: I've added my comment to that one too. Sorry if you felt you were being made a target. – OMG Ponies Sep 10 '09 at 17:58
@rexem -- I still would like to see your answer explained on something I could vote and give you credit for. I used @patmortech's answer and compared with mine and it resulted in nearly identical query plans WRT cost. Both did table scans of T1 & T2. My query 51% of the batch; his 49%. – Austin Salonen Sep 10 '09 at 18:04
@Austin: Appreciated, but patmortech's answer was in before I found the question. – OMG Ponies Sep 10 '09 at 18:11
It may have been but it does little to explain why it's better than an alternative. – Austin Salonen Sep 10 '09 at 18:15
for most of the stuff i've done using the joins with the where clause were faster than the sub queries. *shrugs*. – DForck42 Sep 10 '09 at 20:08

score 4 · Answer 6 · answered Sep 10 '09 at 16:37

4

select *
from Common c
where not exists (select t1.commonid from table1 t1 where t1.commonid = c.commonid)
and not exists (select t2.commonid from table2 t2 where t2.commonid = c.commonid)

answered Sep 10 '09 at 16:37

patmortech

10,139
5
38
50

score 3 · Answer 7 · answered Sep 10 '09 at 16:25

3

SELECT T.common_id
  FROM Common T
       LEFT JOIN Table1 T1 ON T.common_id = T1.common_id
       LEFT JOIN Table2 T2 ON T.common_id = T2.common_id
 WHERE T1.common_id IS NULL
   AND T2.common_id IS NULL

answered Sep 10 '09 at 16:25

manji

47,442
5
96
103

1

This approach is worse that using NOT EXISTS - the join results in fetching more rows than it needs, then the results compared for the columns being null. It works, but the performance will not be as good - possibly worse that using IN with correlated subqueries. – OMG Ponies Sep 10 '09 at 17:50

Jeremy Stein · Answer 8 · 2009-09-11T14:19:40.547

Let's suppose these values for common_id:

Common - 1
Table1 - 2
Table2 - 3, null

We want the row in Common to return, because it doesn't exist in any of the other tables. However, the null throws in a monkey wrench.

With those values, the query is equivalent to:

select *
from Common
where 1 not in (2)
and 1 not in (3, null)

That is equivalent to:

select *
from Common
where not (1=2)
and not (1=3 or 1=null)

This is where the problem starts. When comparing with a null, the answer is unknown. So the query reduces to

select *
from Common
where not (false)
and not (false or unkown)

false or unknown is unknown:

select *
from Common
where true
and not (unknown)

true and not unkown is also unkown:

select *
from Common
where unknown

The where condition does not return records where the result is unkown, so we get no records back.

One way to deal with this is to use the exists operator rather than in. Exists never returns unkown because it operates on rows rather than columns. (A row either exists or it doesn't; none of this null ambiguity at the row level!)

select *
from Common
where not exists (select common_id from Table1 where common_id = Common.common_id)
and not exists (select common_id from Table2 where common_id = Common.common_id)

score 2 · Answer 9 · edited Jun 20 '20 at 09:12

2

this worked for me :)

select * from Common

where

common_id not in (select ISNULL(common_id,'dummy-data') from Table1)

and common_id not in (select ISNULL(common_id,'dummy-data') from Table2)

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 08 '13 at 04:41

arced

467
5
10

@marlar, the sub-queries always return 1 or 0, not a list of values. So how will the `NOT IN` perform there ? – Istiaque Ahmed Nov 10 '17 at 12:14

score 2 · Answer 10 · edited May 18 '20 at 23:19

Please follow the below example to understand the above topic:

Also you can visit the following link to know Anti join

select department_name,department_id from hr.departments dep
where not exists 
    (select 1 from hr.employees emp
    where emp.department_id=dep.department_id
    )
order by dep.department_name;

DEPARTMENT_NAME DEPARTMENT_ID
Benefits    160
Construction    180
Contracting 190
.......

But if we use NOT IN in that case we do not get any data.

select Department_name,department_id from hr.departments dep 
where department_id not in (select department_id from hr.employees );

no data found

This is happening as (select department_id from hr.employees) is returning a null value and the entire query is evaluated as false. We can see it if we change the SQL slightly like below and handle null values with NVL function.

select Department_name,department_id from hr.departments dep 
where department_id not in (select NVL(department_id,0) from hr.employees )

Now we are getting data:

DEPARTMENT_NAME DEPARTMENT_ID
Treasury    120
Corporate Tax   130
Control And Credit  140
Shareholder Services    150
Benefits    160
....

Again we are getting data as we have handled the null value with NVL function.

SQl results are not appearing in tabular form, please bare with me. — Rajesh Sarkar, May 18 '20 at 18:26

score 0 · Answer 11 · edited Jun 10 '13 at 12:36

0

select *,
(select COUNT(ID)  from ProductMaster where ProductMaster.CatID = CategoryMaster.ID) as coun 
from CategoryMaster

edited Jun 10 '13 at 12:36

Jigar Parekh

6,163
7
44
64

answered Jun 10 '13 at 12:17

Donga jayesh

11

score 0 · Answer 12 · answered Dec 07 '19 at 12:57

I had an example where I was looking up and because one table held the value as a double, the other as a string, they would not match (or not match without a cast). But only NOT IN. As SELECT ... IN ... worked. Weird, but thought I would share in case anyone else encounters this simple fix.

SQL "select where not in subquery" returns no results

12 Answers12

The short answer:

The long answer (From T-SQL Fundamentals, Third edition, by Itzik Ben-Gan)

Linked

Related