395

I've got the following two tables (in MySQL):

Phone_book
+----+------+--------------+
| id | name | phone_number |
+----+------+--------------+
| 1  | John | 111111111111 |
+----+------+--------------+
| 2  | Jane | 222222222222 |
+----+------+--------------+

Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 1  | 0945 | 111111111111 |
+----+------+--------------+
| 2  | 0950 | 222222222222 |
+----+------+--------------+
| 3  | 1045 | 333333333333 |
+----+------+--------------+

How do I find out which calls were made by people whose phone_number is not in the Phone_book? The desired output would be:

Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 3  | 1045 | 333333333333 |
+----+------+--------------+
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Philip Morton
  • 129,733
  • 38
  • 88
  • 97

9 Answers9

572

There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:

This is the shortest statement, and may be quickest if your phone book is very short:

SELECT  *
FROM    Call
WHERE   phone_number NOT IN (SELECT phone_number FROM Phone_book)

alternatively (thanks to Alterlife)

SELECT *
FROM   Call
WHERE  NOT EXISTS
  (SELECT *
   FROM   Phone_book
   WHERE  Phone_book.phone_number = Call.phone_number)

or (thanks to WOPR)

SELECT * 
FROM   Call
LEFT OUTER JOIN Phone_Book
  ON (Call.phone_number = Phone_book.phone_number)
  WHERE Phone_book.phone_number IS NULL

(ignoring that, as others have said, it's normally best to select just the columns you want, not '*')

Community
  • 1
  • 1
Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • 4
    avoid IN, use EXISTS - the hint is in the question title – annakata Dec 15 '08 at 12:02
  • 36
    The left outer join is probably quickest in the general case as it prevents repeated execution of the subquery. – WOPR Dec 15 '08 at 21:40
  • Not to be picky, but the subquery on my suggestion returns select 'x' and not select * – Alterlife Dec 17 '08 at 08:36
  • 1
    yes - MySQL manual suggests that this is normal for an 'EXISTS' query – Alnitak Dec 17 '08 at 09:31
  • 4
    @Alnitak: In the second query you don't need `SELECT *` in the subquery. Instead, for instance, `SELECT 1`, should be pretty enough. – Alexander Abakumov Nov 06 '15 at 16:27
  • Re `SELECT 1` in above comment: AFAIK, `EXISTS( SELECT ..`, in modern DBs, optimizes to the same result, regardless of what you put after SELECT. Smart enough to know that `EXISTS` doesn't use the contents of the row. Nevertheless, it doesn't hurt from a readability standpoint to make it obvious that you aren't using the contents :) – ToolmakerSteve Feb 24 '19 at 00:50
  • NOTE: `LEFT OUTER JOIN` (from WOPR) may return multiple rows per row of `Call`, if `Phone_Book` has multiple matching entries. Any time you have `1:N` relationship between the two tables, you may get multiple rows. In this case, may be preferable to use Alterlife's `NOT EXISTS` approach. – ToolmakerSteve Feb 24 '19 at 01:52
119
SELECT Call.ID, Call.date, Call.phone_number 
FROM Call 
LEFT OUTER JOIN Phone_Book 
  ON (Call.phone_number=Phone_book.phone_number) 
  WHERE Phone_book.phone_number IS NULL

Should remove the subquery, allowing the query optimiser to work its magic.

Also, avoid "SELECT *" because it can break your code if someone alters the underlying tables or views (and it's inefficient).

WOPR
  • 5,313
  • 6
  • 47
  • 63
  • 13
    This is generally the most efficient method as it does not perform multiple passes on the second table ... hope some people are reading the comemnts. – Nerdfest Dec 15 '08 at 11:57
  • 6
    I would rather hope that people profile: unless you're a top SQL performance guru, telling in advance what will be the fastest is quite difficult (and depends on the DBMS engine you use). – bortzmeyer Dec 16 '08 at 08:59
  • 2
    Big O notation will easily tell you what you can expect to be fastest in this case. It's orders of magnitude different. – Jonesopolis Sep 30 '16 at 13:52
  • See [Afterlife's answer](https://stackoverflow.com/a/367873/199364) and my comment there, if there is a `1:N` relationship between your two tables. OR add `DISTINCT` as seen in [Vlado's answer](https://stackoverflow.com/a/14356484/199364) – ToolmakerSteve Feb 24 '19 at 01:55
31

The code below would be a bit more efficient than the answers presented above when dealing with larger datasets.

SELECT *
FROM Call
WHERE NOT EXISTS (
    SELECT 'x'
    FROM Phone_book
    WHERE Phone_book.phone_number = Call.phone_number
);
informatik01
  • 16,038
  • 10
  • 74
  • 104
Alterlife
  • 6,557
  • 7
  • 36
  • 49
  • 2
    As always, it's worth profiling the performance of the queries against your target dataset to choose the one with the best performance. SQL optimisers are good enough these days that the performance results are often surprising. – Greg Hewgill Dec 15 '08 at 09:44
  • 1
    An advantage of this approach (vs. LEFT OUTER JOIN by WOPR) is that it avoids returning multiple rows per row of `Call`, if there are multiple matching rows in `Phone_book`. That is, if there is a `1:N` relationship between your two tables. – ToolmakerSteve Feb 24 '19 at 01:54
  • 1
    I would START with this one - it directly represents the intent. If performance not good enough, make sure appropriate indexes exist. Only then, try the less-obvious `LEFT OUTER JOIN`, see if its performance is better. – ToolmakerSteve Feb 24 '19 at 02:05
7
SELECT DISTINCT Call.id 
FROM Call 
LEFT OUTER JOIN Phone_book USING (id) 
WHERE Phone_book.id IS NULL

This will return the extra id-s that are missing in your Phone_book table.

Vlado
  • 3,517
  • 2
  • 26
  • 24
5

I think

SELECT CALL.* FROM CALL LEFT JOIN Phone_book ON 
CALL.id = Phone_book.id WHERE Phone_book.name IS NULL
A dev
  • 930
  • 3
  • 12
  • 27
  • The `id` column in the `call` table is not the same value as the `id` column in the `Phone_book` table, so you can't join on these values. See WOPR's answer for a similar approach. – Michael Fredrickson Feb 15 '12 at 23:11
3
SELECT t1.ColumnID,
CASE 
    WHEN NOT EXISTS( SELECT t2.FieldText  
                     FROM Table t2 
                     WHERE t2.ColumnID = t1.ColumnID) 
    THEN t1.FieldText
    ELSE t2.FieldText
END FieldText       
FROM Table1 t1, Table2 t2
Ian Nelson
  • 57,123
  • 20
  • 76
  • 103
2

Alternatively,

select id from call
minus
select id from phone_number
elifekiz
  • 1,456
  • 13
  • 26
  • 1
    Not sure this answers the question as is (although the MINUS) operator is a new addition. This ended up in the low quality queue - you might like to enhance this answer. – ste-fu Aug 21 '17 at 12:30
1
SELECT name, phone_number FROM Call a
WHERE a.phone_number NOT IN (SELECT b.phone_number FROM Phone_book b)
JoshYates1980
  • 3,476
  • 2
  • 36
  • 57
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/10501884) – Dennis Kriechel Dec 09 '15 at 17:50
  • @DennisKriechel updated query so that it's more specific to the question. – JoshYates1980 Dec 09 '15 at 18:16
1

Don't forget to check your indexes!

If your tables are quite large you'll need to make sure the phone book has an index on the phone_number field. With large tables the database will most likely choose to scan both tables.

SELECT *
FROM   Call
WHERE  NOT EXISTS
  (SELECT *
   FROM   Phone_book
   WHERE  Phone_book.phone_number = Call.phone_number)

You should create indexes both Phone_Book and Call containing the phone_number. If performance is becoming an issue try an lean index like this, with only the phone number:

The fewer fields the better since it will have to load it entirely. You'll need an index for both tables.

ALTER TABLE [dbo].Phone_Book ADD CONSTRAINT [IX_Unique_PhoneNumber] UNIQUE NONCLUSTERED 
(
    Phone_Number
)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = ON) ON [PRIMARY]
GO

If you look at the query plan it will look something like this and you can confirm your new index is actually being used. Note this is for SQL Server but should be similar for MySQL.

With the query I showed there's literally no other way for the database to produce a result other than scanning every record in both tables.

enter image description here

Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689