1

Let's say I have about 25,000 records in two tables and the data in each should be the same. If I need to find any rows that are in table A but NOT in table B, what's the most efficient way to do this.

We've tried it as a subquery of one table and a NOT IN the result but this runs for over 10 minutes and almost crashes our site.

There must be a better way. Maybe a JOIN?

emersonthis
  • 32,822
  • 59
  • 210
  • 375

4 Answers4

3

Hope LEFT OUTER JOIN will do the job

select t1.similar_ID 
    , case when t2.similar_ID is not null then 1 else 0 end as row_exists
from table1 t1
left outer join (select distinct similar_ID from table2) t2
   on t1.similar_ID  = t2.similar_ID // your WHERE goes here
M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118
2

I would suggest you read the following blog post, which goes into great detail on this question:

Which method is best to select values present in one table but missing in another one?

And after a thorough analysis, arrives at the following conclusion:

However, these three methods [NOT IN, NOT EXISTS, LEFT JOIN] generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

If the performance you're seeing with NOT IN is not satisfactory, you won't improve this performance by switching to a LEFT JOIN / IS NULL or NOT EXISTS, and instead you'll need to take a different route to optimizing this query, such as adding indexes.

Michael Fredrickson
  • 36,839
  • 5
  • 92
  • 109
  • I love Quassnoi's posts - I just wish they were a little more frequent. That said, I think the latest version of MySQL must have improved the optimization of IN and EXISTS somehow, because testing those same DDLs and DMLs on my system returns almost entirely opposite performance statistics: LEFT JOIN: 12.43; NOT IN 6.04; NOT EXISTS: 5.93. Incidentally, a JOIN (0 results) executes in 0 seconds. – Strawberry Aug 01 '13 at 17:58
  • @Strawberry Good point... that analysis was done almost 4 years ago. I'm surprised that `LEFT JOIN` is taking twice as long as the other methods... but with `NOT IN` and `NOT EXISTS` within 2% of each other, I'd still suggest that the OP look into indexing these tables since an anti-join between two tables of 25k rows should take milliseconds, not minutes. – Michael Fredrickson Aug 01 '13 at 18:15
  • 1
    This dude knows what he is talking about. Indexing is the answer. – usumoio Aug 01 '13 at 19:51
0

Use exixts and not exists function instead

Select * from A where not exists(select * from B);
Praveen Prasannan
  • 7,093
  • 10
  • 50
  • 70
0

Left join. From the mysql documentation

If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:

SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id WHERE right_tbl.id IS NULL;

This example finds all rows in left_tbl with an id value that is not present in right_tbl (that is, all rows in left_tbl with no corresponding row in right_tbl).