Mysql select where not in table

Question

I have 2 tables (A and B) with the same primary keys. I want to select all row that are in A and not in B. The following works:

select * from A where not exists (select * from B where A.pk=B.pk);

however it seems quite bad (~2 sec on only 100k rows in A and 3-10k less in B)

Is there a better way to run this? Perhaps as a left join?

select * from A left join B on A.x=B.y where B.y is null;

On my data this seems to run slightly faster (~10%) but what about in general?

Do you mean you have 100k rows in table A and 100k rows in table B? Or just around 300 rows in each table, meaning 100,000 row scans (or, hopefully, index scans). — ChrisInEdmonton, Dec 09 '08 at 20:00

score 62 · Answer 1 · edited Sep 04 '12 at 11:02

62

I think your last statement is the best way. You can also try

SELECT A.*    
from A left join B on 
    A.x = B.y
    where B.y is null

edited Sep 04 '12 at 11:02

Smok

246
1
7

answered Dec 09 '08 at 20:09

Nick Berardi

54,393
15
113
135

2

That doesn't make any sense. When B.y is null, A.x=B.y will never be true. You'll get all rows of A, not just those that have no matching row in B. – Bill Karwin Dec 09 '08 at 20:46
2

@Bill and yet it works! Plus it is the exact same thing as the second statement listed above. – Nick Berardi Dec 11 '08 at 14:12
1

@hamstar, yes, now it's correct, by changing `and` to `where`. – Bill Karwin Oct 16 '12 at 17:23
2

@Smok "I think your last statement is the best way. You can also try..." ~ Is yours not identical? – Neil Hillman Feb 19 '14 at 14:06
@BillKarwin It makes a lot of sense. Left joins include every A record at least once--once for each matching B, or one with NULL in the B columns if there is no matching B. From http://www.w3schools.com/sql/sql_join_left.asp: "The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match." The A's that have a mathcing B will have a non-null B.y for every occurence, while the A's with no B have one occurence with a null B.y – Isaac Betesh Jun 20 '14 at 15:57
1

@IsaacBetesh, thanks, but you should be aware that I posted my first comment above before hamstar's edit to fix the query logic. Check the edit history to see the original answer, which was wrong. – Bill Karwin Jun 20 '14 at 16:13

score 37 · Accepted Answer · answered Dec 09 '08 at 20:47

37

I use queries in the format of your second example. A join is usually more scalable than a correlated subquery.

answered Dec 09 '08 at 20:47

Bill Karwin

538,548
86
673
828

ChoNuff · Answer 3 · 2014-08-21T04:34:17.143

2

Joins are generally faster (in MySQL), but you should also consider your indexing scheme if you find that it's still moving slowly. Generally, any field setup as a foreign key (using INNODB) will already have an index set. If you're using MYISAM, make sure that any columns in the ON statement are indexed, and consider also adding any columns in the WHERE clause to the end of the index, to make it a covering index. This allows the engine to have access to all the data needed in the index, removing the need to make a second round-trip back to the original data. Keep in mind that this will impact the speed of inserts/updates/deletes, but can significantly increase the speed of the query.

edited Aug 21 '14 at 04:34

answered Sep 28 '10 at 21:58

ChoNuff

814
6
12

Actually, Raul, your edit makes it sound like you should create a separate index for the columns in the WHERE clause. For clarity, I meant to add the columns to the index (covering index), so the engine wouldn't have to make a round-trip back to the base data for the WHERE clause. Apologies for not being clear. – ChoNuff Aug 21 '14 at 04:31

score 2 · Answer 4 · answered May 27 '09 at 14:52

2

I also use left joins with a "where table2.id is null" type criteria.

Certainly seems to be more efficient than the nested query option.

answered May 27 '09 at 14:52

Dave Rix

1,621
1
11
17

score -2 · Answer 5 · edited Dec 14 '15 at 10:36

-2

This helped me a lot. Joins are always faster than Sub Queries to give results:

SELECT tbl1.id FROM tbl1 t1
LEFT OUTER JOIN tbl2 t2 ON t1.id = t2.id 
WHERE t1.id>=100 AND t2.id IS NULL ;

edited Dec 14 '15 at 10:36

Jamie Rees

7,973
2
45
83

answered Dec 14 '15 at 10:25

user3136147

17
2

Mysql select where not in table

5 Answers5

Linked

Related