I have some funny bug with Postgres 9.5.1
I got 2 tables that contain related data contacts (id, name)
and jobs (id, contact_id, name)
.
I'm not sure of the validity of this query (given the curious behavior explained just after).
-- get unassigned contacts
select * from contacts where id not in (select contact_id from jobs);
Edit : The following case was how i tried to analyze the issue. See the end of post and comments to get why the query is not correct.
When testing with a case of contact id=20 without job, I got some (IMO) strange result (a notable difference in results between a select query and a join equivalent).
First, I need to assert some prerequisites (step A). Next, I show the result with join (step B). Finally, I show the result using subquery (step D). (Step C is the complementary request of D and is only here to highlight what I found strange).
A-0. check that there is datas in both tables : OK
select count(distinct id) from contacts;
--> returns 10100
select count(distinct id) from jobs;
--> returns 12000
select count(distinct id) from contacts where id in (select contact_id from jobs);
--> returns 10000
A-1. get name in table contacts for id=20 : OK
select name from contacts where id=20;
--> returns "NAME"
A-3. check contact id=20 is NOT in table jobs : OK
select id from jobs where contact_id=20;
--> returns nothing (0 row)
B. get name and (null) job id for contact id=20 with join : OK
select c.id, c.name, j.id
from contacts c
left join jobs j
on j.contact_id=c.id
where c.id=20;
--> returns 20, "NAME", <NULL>
C. get contact id=20 only if it is assigned in jobs : OK
select name from contacts where id in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's the expected result)
D. get contact id=20 only if is NOT assigned in jobs : KO
select name from contacts where id not in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's not the expected result - "NAME")
Funny conclusion
C and D queries got the same results.
In logical terms, this could means that in pgsql:
id NOT IN (..values..) == id IN (..values..)
FALSE == TRUE
Can a "Postgres guru" find me a nice explanation or should I call the FBI?
Epilogue
following the answers
My query
select * from contacts where id not in (select contact_id from jobs);
was not correct, because NOT IN
can't handle NULL values. Therefore, it is not the right selector to check (non)existence of a value.
See NULL values inside NOT IN clause.
The correct query is the following :
-- to get unassigned contacts
select * from contacts c where not exists (select 1 from jobs where contact_id=c.id);
For a specified id :
select * from contacts c where not exists (select 1 from jobs where contact_id=c.id) and id=20;
This query works too :
select * from contacts where id not in (select contact_id from jobs where contact_id is not null);