1

I have this users table:

users

and this relationships table:

enter image description here

So each user is paired with another one in the relationships table.

Now I want to get a list of users which are not in the relationships table, in either of the two columns (user_id or pair_id).

How could I write that query?

First try:

SELECT users.id
FROM users
LEFT OUTER JOIN relationships
ON users.id = relationships.user_id
WHERE relationships.user_id IS NULL;

Output:

enter image description here

This is should display only 2 results: 5 and 6. The result 8 is not correct, as it already exists in relationships. Of course I'm aware that the query is not correct, how can I fix it?

  • I'm using PostgreSQL.
David Morales
  • 17,816
  • 12
  • 77
  • 105

4 Answers4

4

You need to compare to both values in the on statement:

SELECT u.id
FROM users u LEFT OUTER JOIN
     relationships r
     ON u.id = r.user_id or u.id = r.pair_id
WHERE r.user_id IS NULL;

In general, or in an on clause can be inefficient. I would recommend replacing this with two not exists statements:

SELECT u.id
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.user_id) AND
      NOT EXISTS (SELECT 1 FROM relationships r WHERE u.id = r.pair_id);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • +1 thanks for the alternative more efficient version – David Morales May 29 '14 at 15:34
  • BTW, this is in general terms called a "left anti-join". – Craig Ringer May 30 '14 at 01:52
  • @David . . . I think this may be faster than Erwin's answer (which I say with some trepidation) if you have indexes on both `relationships(user_id)` and `relationships(pair_id)`. – Gordon Linoff May 30 '14 at 02:12
  • @GordonLinoff even having one more select? – David Morales May 30 '14 at 11:22
  • @David . . . Many databases will not take advantage of two indexes for an `in` or `or` clause in a correlated subquery. I do not know if Postgres would take advantage of indexes in the formulation in Erwin's answer, but I am skeptical. – Gordon Linoff May 30 '14 at 11:26
  • @GordonLinoff Ok you convinced me, and also the number of votes on your answer. Switching the accepted answer to yours :) – David Morales May 30 '14 at 12:19
  • @David: Why don't you just test with `EXPLAIN ANALYZE`? Of course, either of these queries is going to use matching indexes in Postgres. It depends on various details which of the queries will be faster. Mine is shorter and more versatile. – Erwin Brandstetter May 30 '14 at 12:49
  • @David . . . Erwin is right. You should test the answers and see which gives you the performance/execution plan that you want. – Gordon Linoff May 30 '14 at 13:40
  • Both queries are fast enough for my needs. Gordon's query has 9 steps (2x Hash anti join, 2x Hash cond, 3x Seq scan, 2x Hash), while Erwin's has 5 (1x Nested loop anti join, 1x Join filter, 2x Seq scan, 1x Materialize). The final runtime is barely the same for both. It's very difficult to choose one. I will choose Gordon's because it seems clearer to me, although Erwin's uses one less select, both are very good pieces of code. Thanks :) – David Morales May 31 '14 at 16:46
2

I like the set operators

select id from users
  except
select user_id from relationships
  except
select pair_id from relationships

or

select id from users
  except
(select user_id from relationships
   union
 select pair_id from relationships
)
Glenn
  • 8,932
  • 2
  • 41
  • 54
2

This is a special case of:
Select rows which are not present in other table

I suppose this will be simplest and fastest:

SELECT u.id
FROM   users u
WHERE  NOT EXISTS (
   SELECT 1
   FROM   relationships r
   WHERE  u.id IN (r.user_id, r.pair_id)
   );

In Postgres, u.id IN (r.user_id, r.pair_id) is just short for:(u.id = r.user_id OR u.id = r.pair_id).
The expression is transformed that way internally, which can be observed from EXPLAIN ANALYZE.

To clear up speculations in the comments: Modern versions of Postgres are going to use matching indexes on user_id, and / or pair_id with this sort of query.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

Something like:

select u.id
from users u
where u.id not in (select r.user_id from relationships r) 
and u.id not in (select r.pair_id from relationships r) 
Alan Hay
  • 22,665
  • 4
  • 56
  • 110