1

I have some funny bug with Postgres 9.5.1

I got 2 tables that contain related data contacts (id, name) and jobs (id, contact_id, name).

I'm not sure of the validity of this query (given the curious behavior explained just after).

-- get unassigned contacts 
select * from contacts where id not in (select contact_id from jobs);

Edit : The following case was how i tried to analyze the issue. See the end of post and comments to get why the query is not correct.

When testing with a case of contact id=20 without job, I got some (IMO) strange result (a notable difference in results between a select query and a join equivalent).

First, I need to assert some prerequisites (step A). Next, I show the result with join (step B). Finally, I show the result using subquery (step D). (Step C is the complementary request of D and is only here to highlight what I found strange).

A-0. check that there is datas in both tables : OK

select count(distinct id) from contacts;
--> returns 10100
select count(distinct id) from jobs;
--> returns 12000
select count(distinct id) from contacts where id in (select contact_id from jobs);
--> returns 10000

A-1. get name in table contacts for id=20 : OK

select name from contacts where id=20;
--> returns "NAME"

A-3. check contact id=20 is NOT in table jobs : OK

select id from jobs where contact_id=20;
--> returns nothing (0 row)

B. get name and (null) job id for contact id=20 with join : OK

select c.id, c.name, j.id
from contacts c
left join jobs j
on j.contact_id=c.id
where c.id=20;
--> returns 20, "NAME", <NULL>

C. get contact id=20 only if it is assigned in jobs : OK

select name from contacts where id in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's the expected result)

D. get contact id=20 only if is NOT assigned in jobs : KO

select name from contacts where id not in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's not the expected result - "NAME")

Funny conclusion

C and D queries got the same results.

In logical terms, this could means that in pgsql:

id NOT IN (..values..)  ==  id IN (..values..)
FALSE == TRUE

Can a "Postgres guru" find me a nice explanation or should I call the FBI?

Epilogue

following the answers

My query

select * from contacts where id not in (select contact_id from jobs);

was not correct, because NOT IN can't handle NULL values. Therefore, it is not the right selector to check (non)existence of a value.

See NULL values inside NOT IN clause.

The correct query is the following :

-- to get unassigned contacts 
select * from contacts c  where not exists (select 1 from jobs where contact_id=c.id);

For a specified id :

select * from contacts c  where not exists (select 1 from jobs where contact_id=c.id) and id=20;

This query works too :

select * from contacts where id not in (select contact_id from jobs where contact_id is not null);
Luffah
  • 13
  • 3
  • `NULL = NULL` does not yield True, in SQL. (and the IN() operator ignores NULLs) – wildplasser Apr 21 '20 at 22:01
  • https://stackoverflow.com/a/34380890/905902 – wildplasser Apr 21 '20 at 22:36
  • Read the manual for the functionality you are using. Also: This is a faq. Before considering posting please read the manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. Reflect your research. See [ask] & the voting arrow mouseover texts. – philipxy Apr 21 '20 at 22:42
  • id NOT IN (..values..) == id IN (..values..) is not an axiom/fact/truth in SQL. "NOT IN is not NOT (IN)". Also see my comment on the current version of the answer by GMB about how that answer's example is wrong. – philipxy Apr 21 '20 at 23:16
  • @wildplasser See my edited last comment. My point is, both your shorthand & longhand are unhelpful--and the longhand is not evident from that shorthand--because they are just too sloppy to communicate how NULLs are treated. – philipxy Apr 21 '20 at 23:22
  • Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL (including constraints & indexes) & input as code formatted as a table. PS Here NULLs matter, but you don't actually show enough of your tables to account for the results you get. – philipxy Apr 22 '20 at 00:13

1 Answers1

2

What you are seeing is a null-safety problem. If any value returned by the not in subquery is null, all other values are ignored. We say that not in is not null-safe.

Imagine that the subquery returns: (1, 2, null). The not in condition becomes:

id <> 1 and id <> 2 and id <> null

The first two conditions evaluate as true, but the last one is unknown, which contaminates the whole predicate, that, in turns, returns unknown. As a consequence, all rows are evicted.

This is one of the reason why the use of not in is usually discouraged. You can simply rewrite this with not exists:

select name 
from contacts c 
where c.id = 20 and not exists(select 1 from jobs j where j.contact_id = c.id);
GMB
  • 216,147
  • 25
  • 84
  • 135
  • That expression `id <> 1 and id <> 2 and id <> null` is not the right expansion.--For id 1 the result of NOT IN is FALSE, even if there are NULLs in the RHS, but yours gives NULL, because the NULL from <>NULL cascades to the result. IN is =SOME but NOT IN is <>ALL. So `not( id=1 or ...)`. – philipxy Apr 21 '20 at 23:19
  • 2
    @philipxy: `not in (1,2,null)` **is** equivalent to the expression given by GMB –  Apr 22 '20 at 08:34
  • @a_horse_with_no_name You are talking about function call x IN (list) & X NOT IN (list) is NOT (x IN (list)) but the post & I--despite the pseudocode syntax used in the post--which expresses a query result not SQL IN list code--are talking about predicate x NOT IN (subquery) & what I wrote about it is correct. – philipxy Apr 22 '20 at 08:51
  • 1
    @philipxy: No I am not talking about a function call. `not in (....)` is logically equivalent to what GMB wrote. The equivalent in boolean logic to your expression `not (id=1 or ..)` **is** `id <> 1 and id <> 2 ...` –  Apr 22 '20 at 09:00