MYSQL (NOT IN) query is very slow

Question

i don't know why this query take 45 min to give me the result

table1= 831293 row

table2= 2000.000 row

SELECT ID, name FROM table1 WHERE name not IN (SELECT name FROM table2 )  LIMIT 831293

my.cnf file look like this :

max_allowed_packet  = 64M
innodb_buffer_pool_size = 8G
innodb_log_file_size    = 256M 
query_cache_type        = 1
query_cache_limit   = 1M

try running EXPLAIN on the query.. you probably dont have your indexes set up correctly... also look at NOT EXISTS as opposed to NOT IN because its usually faster — John Ruddell, Jun 11 '14 at 19:54
Please add the output of `EXPLAIN` and the table definitions. — VMai, Jun 11 '14 at 19:55

score 1 · Answer 1 · answered Jun 11 '14 at 19:58

1

This select is very expensive because in the worst case for every record in table1 you must go through 2.000.000 records in table2. I am not MySQL specialist, but on the Oracle platform I would consider indexing and table spaces for table2.

BTW, the LIMIT part is useless.

answered Jun 11 '14 at 19:58

Michal Krasny

5,434
7
36
64

i'm using limit to get a specific number of rows from table 1 – Ilyas Oirraq Jun 11 '14 at 20:06
I think, that in this case the LIMIT statement limits the whole result, not the table 1 rows. If you want to limit table 1, then use something like this SELECT ID, name FROM (select ID, name from table1 LIMIT 831293) WHERE name not IN (SELECT name FROM table2 ) – Michal Krasny Jun 12 '14 at 06:49

score 1 · Accepted Answer · answered Jun 11 '14 at 20:01

1

Do you have an index on name?

If you have to loop each time through table2 per row, you effectively loop through 831293*200000 rows, which is a lot.

When you have an index on name, you can get a major performance increase, as it is easier to search for a name in this index. (probably even O(1) as there will probably be a hashtable)

you can do this as follows

ALTER TABLE `table2` ADD INDEX (`name`)

answered Jun 11 '14 at 20:01

Pinna_be

4,517
2
18
34

Thomas Thank u dude your absolutly right, can u give me more explanation about how index on name make my query so much faster – Ilyas Oirraq Jun 11 '14 at 20:18
I,d like to, but here is an explanation that will be far better then mine: http://stackoverflow.com/questions/3567981/how-do-mysql-indexes-work – Pinna_be Jun 11 '14 at 20:52

matt stucky · Answer 3 · 2014-06-11T20:57:19.297

agreed on: (a) add the indexes and (b) the limit clause is useless.

consider a 'left outer join' - it will bring all rows from the "left" table even where no join value in the "right" table exists - then you filter out the non-desired "right" values.

something like:

SELECT t1.ID, t1.name 
FROM table1 t1 left outer join table2 t2 
on t1.name = t2.name
where t2.name is null;

note: sql above assumes name is unique in t2, and not unique in t1.

VJ Hil · Answer 4 · 2014-06-11T20:10:13.390

0

I think using Exist will be faster.Try This

select Id,name from
(
    select table1.ID as ID, table1.name as name  
    FROM table1 inner join table2 
    on table1.name=table2.name
)
    WHERE not EXISTS
    (
    select 1  
    FROM table1 inner join table2 
    on table1.name=table2.name);

edited Jun 11 '14 at 20:10

answered Jun 11 '14 at 20:01

VJ Hil

904
6
15

i get this message when i try to execute not exists ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE not EXISTS – Ilyas Oirraq Jun 11 '14 at 20:09

MYSQL (NOT IN) query is very slow

4 Answers4