MySQL difference in two columns

Question

I have a table with two columns A and B, they contain values of same domain, let's say I want to find the values that only appears in A but not in B, i.e

A has 1,2,3 B has 2,3

so my answer will be A - B = {1}

so the SQL query I wrote is

SELECT DISTINCT(A) 
  FROM DB
 WHERE A NOT IN (SELECT DISTINCT(B) 
                   FROM DB);

It doesn't work and taking forever to response (8000 records approx), does anyone have a solution similar to the above? Thanks.

I found some thread e.g MySQL: difference of two result sets but doesn't really do what I want to do thou.

The question you linked to is the [fastest means of determining what rows exist in one MySQL table that don't in another, where the columns compared are not nullable](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/) -- that's not the case if they are nullable. You'll have to explain why this "isn't doing what you want", preferrably with example data. — OMG Ponies, May 07 '11 at 15:34
I did explain actually, it doesn't return any result as it takes forever and I don't know why. — drhanlau, May 07 '11 at 15:35
possible duplicate of [MySQL: difference of two result sets](http://stackoverflow.com/questions/2723839/mysql-difference-of-two-result-sets) — OMG Ponies, May 07 '11 at 15:47

score 0 · Answer 1 · answered May 07 '11 at 15:38

0

How does the below perform? If you're getting poor performance, ensure you've indexes on both columns.

SELECT DISTINCT DB1.A
FROM            DB DB1
LEFT JOIN       DB DB2
ON              DB2.B = DB1.A
WHERE           DB2.B IS NULL

answered May 07 '11 at 15:38

Will A

24,780
5
50
61

bobflux · Answer 2 · 2011-05-07T15:48:15.553

When working with MySQL you have to remember that it has no ways of doing this query fast without indexes, because it does not support :

hash aggregation
hash joins
merge joins
hash IN() checks
etc
do not think about doing this :

    EXPLAIN ANALYZE SELECT DISTINCT a FROM t WHERE a NOT IN (SELECT b FROM t);
    HashAggregate  (actual time=7.283..7.428 rows=827 loops=1)
    ->  Seq Scan on t (actual time=4.159..6.778 rows=1697 loops=1)
        Filter: (NOT (hashed SubPlan 1))
          SubPlan 1
          ->  Seq Scan on t  (actual time=0.006..1.497 rows=8000 loops=1)
     Total runtime: 6.840 ms

Also MySQL (depending on the version) is more or less braindead regarding subqueries, although it tend to get better.

create an index on column A
create an index on column B

Try :

SELECT DISTINCT t1.a 
FROM table t1 
LEFT JOIN table t2 ON (t1.a=t2.b)
WHERE t2.b IS NULL

Or :

SELECT DISTINCT a 
FROM table t1 
WHERE NOT EXISTS( 
 SELECT * FROM t2 WHERE t2.b=t1.a
)

score 0 · Answer 3 · answered May 07 '11 at 15:44

0

SELECT DISTINCT d1.A 
FROM DB d1 LEFT JOIN DB d2 ON d1.A=d2.B 
WHERE de.B IS NULL

ALTER TABLE DB ADD INDEX a_idx(A);

ALTER TABLE DB ADD INDEX b_idx(B);

answered May 07 '11 at 15:44

neocanable

5,293
2
23
28

MySQL difference in two columns

3 Answers3