1

I have a table with two columns A and B, they contain values of same domain, let's say I want to find the values that only appears in A but not in B, i.e

A has 1,2,3 B has 2,3

so my answer will be A - B = {1}

so the SQL query I wrote is

SELECT DISTINCT(A) 
  FROM DB
 WHERE A NOT IN (SELECT DISTINCT(B) 
                   FROM DB);

It doesn't work and taking forever to response (8000 records approx), does anyone have a solution similar to the above? Thanks.

I found some thread e.g MySQL: difference of two result sets but doesn't really do what I want to do thou.

Community
  • 1
  • 1
drhanlau
  • 2,517
  • 2
  • 24
  • 42
  • The question you linked to is the [fastest means of determining what rows exist in one MySQL table that don't in another, where the columns compared are not nullable](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/) -- that's not the case if they are nullable. You'll have to explain why this "isn't doing what you want", preferrably with example data. – OMG Ponies May 07 '11 at 15:34
  • I did explain actually, it doesn't return any result as it takes forever and I don't know why. – drhanlau May 07 '11 at 15:35
  • possible duplicate of [MySQL: difference of two result sets](http://stackoverflow.com/questions/2723839/mysql-difference-of-two-result-sets) – OMG Ponies May 07 '11 at 15:47

3 Answers3

0

How does the below perform? If you're getting poor performance, ensure you've indexes on both columns.

SELECT DISTINCT DB1.A
FROM            DB DB1
LEFT JOIN       DB DB2
ON              DB2.B = DB1.A
WHERE           DB2.B IS NULL
Will A
  • 24,780
  • 5
  • 50
  • 61
0

When working with MySQL you have to remember that it has no ways of doing this query fast without indexes, because it does not support :

  • hash aggregation
  • hash joins
  • merge joins
  • hash IN() checks
  • etc
  • do not think about doing this :
    EXPLAIN ANALYZE SELECT DISTINCT a FROM t WHERE a NOT IN (SELECT b FROM t);
    HashAggregate  (actual time=7.283..7.428 rows=827 loops=1)
    ->  Seq Scan on t (actual time=4.159..6.778 rows=1697 loops=1)
        Filter: (NOT (hashed SubPlan 1))
          SubPlan 1
          ->  Seq Scan on t  (actual time=0.006..1.497 rows=8000 loops=1)
     Total runtime: 6.840 ms

Also MySQL (depending on the version) is more or less braindead regarding subqueries, although it tend to get better.

  • create an index on column A
  • create an index on column B

Try :

SELECT DISTINCT t1.a 
FROM table t1 
LEFT JOIN table t2 ON (t1.a=t2.b)
WHERE t2.b IS NULL

Or :

SELECT DISTINCT a 
FROM table t1 
WHERE NOT EXISTS( 
 SELECT * FROM t2 WHERE t2.b=t1.a
)
bobflux
  • 11,123
  • 3
  • 27
  • 27
0
SELECT DISTINCT d1.A 
FROM DB d1 LEFT JOIN DB d2 ON d1.A=d2.B 
WHERE de.B IS NULL

ALTER TABLE DB ADD INDEX a_idx(A);

ALTER TABLE DB ADD INDEX b_idx(B);
neocanable
  • 5,293
  • 2
  • 23
  • 28