How can I get records from one table which do not exist in a related table?

Question

I have this users table:

users

and this relationships table:

enter image description here

So each user is paired with another one in the relationships table.

Now I want to get a list of users which are not in the relationships table, in either of the two columns (user_id or pair_id).

How could I write that query?

First try:

SELECT users.id
FROM users
LEFT OUTER JOIN relationships
ON users.id = relationships.user_id
WHERE relationships.user_id IS NULL;

Output:

enter image description here

This is should display only 2 results: 5 and 6. The result 8 is not correct, as it already exists in relationships. Of course I'm aware that the query is not correct, how can I fix it?

I'm using PostgreSQL.

score 4 · Accepted Answer · answered May 29 '14 at 13:02

4

You need to compare to both values in the on statement:

SELECT u.id
FROM users u LEFT OUTER JOIN
     relationships r
     ON u.id = r.user_id or u.id = r.pair_id
WHERE r.user_id IS NULL;

In general, or in an on clause can be inefficient. I would recommend replacing this with two not exists statements:

SELECT u.id
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.user_id) AND
      NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.pair_id);

answered May 29 '14 at 13:02

Gordon Linoff

1,242,037
58
646
786

+1 thanks for the alternative more efficient version – David Morales May 29 '14 at 15:34
BTW, this is in general terms called a "left anti-join". – Craig Ringer May 30 '14 at 01:52
@David . . . I think this may be faster than Erwin's answer (which I say with some trepidation) if you have indexes on both `relationships(user_id)` and `relationships(pair_id)`. – Gordon Linoff May 30 '14 at 02:12
@GordonLinoff even having one more select? – David Morales May 30 '14 at 11:22
@David . . . Many databases will not take advantage of two indexes for an `in` or `or` clause in a correlated subquery. I do not know if Postgres would take advantage of indexes in the formulation in Erwin's answer, but I am skeptical. – Gordon Linoff May 30 '14 at 11:26
@GordonLinoff Ok you convinced me, and also the number of votes on your answer. Switching the accepted answer to yours :) – David Morales May 30 '14 at 12:19
@David: Why don't you just test with `EXPLAIN ANALYZE`? Of course, either of these queries is going to use matching indexes in Postgres. It depends on various details which of the queries will be faster. Mine is shorter and more versatile. – Erwin Brandstetter May 30 '14 at 12:49
@David . . . Erwin is right. You should test the answers and see which gives you the performance/execution plan that you want. – Gordon Linoff May 30 '14 at 13:40
Both queries are fast enough for my needs. Gordon's query has 9 steps (2x Hash anti join, 2x Hash cond, 3x Seq scan, 2x Hash), while Erwin's has 5 (1x Nested loop anti join, 1x Join filter, 2x Seq scan, 1x Materialize). The final runtime is barely the same for both. It's very difficult to choose one. I will choose Gordon's because it seems clearer to me, although Erwin's uses one less select, both are very good pieces of code. Thanks :) – David Morales May 31 '14 at 16:46

score 2 · Answer 2 · answered May 29 '14 at 13:13

2

I like the set operators

select id from users
  except
select user_id from relationships
  except
select pair_id from relationships

or

select id from users
  except
(select user_id from relationships
   union
 select pair_id from relationships
)

answered May 29 '14 at 13:13

Glenn

8,932
2
41
54

score 2 · Answer 3 · edited May 23 '17 at 11:50

2

This is a special case of:
Select rows which are not present in other table

I suppose this will be simplest and fastest:

SELECT u.id
FROM   users u
WHERE  NOT EXISTS (
   SELECT 1
   FROM   relationships r
   WHERE  u.id IN (r.user_id, r.pair_id)
   );

In Postgres, u.id IN (r.user_id, r.pair_id) is just short for:(u.id = r.user_id OR u.id = r.pair_id).
The expression is transformed that way internally, which can be observed from EXPLAIN ANALYZE.

To clear up speculations in the comments: Modern versions of Postgres are going to use matching indexes on user_id, and / or pair_id with this sort of query.

edited May 23 '17 at 11:50

Community

1
1

answered May 29 '14 at 14:26

Erwin Brandstetter

605,456
145
1,078
1,228

+1 This seems very efficient to me, and using just 2 selects. It also lets tweak the select freely. Thanks. – David Morales May 29 '14 at 15:33

score 0 · Answer 4 · answered May 29 '14 at 13:01

0

Something like:

select u.id
from users u
where u.id not in (select r.user_id from relationships r) 
and u.id not in (select r.pair_id from relationships r)

answered May 29 '14 at 13:01

Alan Hay

22,665
4
56
110

How can I get records from one table which do not exist in a related table?

4 Answers4