0

I have a speed problem when comparing two tables. Let's say I have the following tables.

Table A 14,000 records

name(varchar) , join_id(int)

Table b 54,209 records

second_name , join_id(int)

Now I want the find the rows which exists in table A , but not in B connected by joint_id Ordered by random ( sort of )

I tried the following :

 SELECT a.name , b.second_name 
    FROM a
    LEFT OUTER JOIN b ON a.joint_id = b.joint_id
    WHERE b.joint_id IS NULL
    LIMIT 0,10

The query took ages and is messing up the server , so my question is ;

Is there a faster way to accomplish this ?

EDIT : I removed the RAND() , this can be solved otherwise. But I still have the same problem. I will try out the suggestions below and see if I can improve the query time.

Paolo_Mulder
  • 1,233
  • 1
  • 16
  • 28

2 Answers2

3

Notice I gave your second column a fixed value of NULL, since when b does not exist, b.second_name => NULL. The crux of the problem is that ORDER BY RAND() require a full scan to put a rand() against each record. You cannot avoid a table scan.

SELECT a.name , NULL AS second_name 
FROM a
WHERE NOT EXISTS (Select * from b WHERE a.joint_id = b.joint_id)
ORDER BY RAND() 
LIMIT 0,10;

You could possibly make it faster by not having to drag all columns from table a, assuming you have an ID on table a. http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

SELECT a.name , NULL AS second_name 
FROM a
WHERE a.ID in (
    select id from a
    WHERE NOT EXISTS (Select * from b WHERE a.joint_id = b.joint_id)
    ORDER BY RAND()
    LIMIT 0,10)
RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • 1
    +1 @K Ivanov: I would expect the NOT EXISTS to perform better. See: [Left outer join vs NOT EXISTS](http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/) – Joe Stefanelli Feb 09 '11 at 19:49
  • Bear in mind that my blog post (which Joe referenced) is for MS SQL Server, not MySQL. No conclusions should be drawn from it as to how MySQL handles joins and exists – GilaMonster Feb 10 '11 at 20:37
  • @K I You're right, it is the same performance in MySQL (which implements LEFT JOIN + IS NULL using an Anti-SemiJoin that shortcuts as soon as a match `exists|not exists` is satisfied). But the crux of the post is not about rewriting, but about (1) b.second_name being null (2) rand = table scan (3) not using all columns – RichardTheKiwi Feb 10 '11 at 20:48
2

Doesn't b.second_name show as NULL in all rows?

Besides checking index use (with EXPLAIN), you can also check this version:

SELECT a.name 
    FROM a
    WHERE a.joint_id NOT IN ( SELECT b.joint_id
                                FROM b 
                            )
    ORDER BY RAND() 
    LIMIT 0,10

Caution: if the b.joint_id has NOT NULL attribute, the NOT IN behaves like the NOT EXISTS examples that other wrote.

If the b.joint_id however can be Nullable, do not use this answer. It will give wrong results.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235