11

I'm having a problem getting a query to work, which I think should work. It's in the form

SELECT DISTINCT a, b, c FROM t1 WHERE NOT IN ( SELECT DISTINCT a,b,c FROM t2 ) AS alias

But mysql chokes where "IN (" starts. Does mysql support this syntax? If not, how can I go about getting these results? I want to find distinct tuples of (a,b,c) in table 1 that don't exist in table 2.

user151841
  • 17,377
  • 29
  • 109
  • 171

6 Answers6

15

You should use not exists:

SELECT DISTINCT a, b, c FROM t1 WHERE NOT EXISTS (SELECT NULL FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)

Using NOT IN is not the best method to do this, even if you check only one key. The reason is that if you use NOT EXISTS the DBMS will only have to check indices if indices exist for the needed columns, where as for NOT IN it will have to read the actual data and create a full result set that subsequently needs to be checked.

Using a LEFT JOIN and then checking for NULL is also a bad idea, it will be painfully slow when the tables are big since the query needs to make the whole join, reading both tables fully and subsequently throw away a lot of it. Also, if the columns allow for NULL values checking for NULL will report false positives.

wich
  • 16,709
  • 6
  • 47
  • 72
  • You know what? I ran this query, and it was taking a long time ( longer than 10 minutes ), so I killed it. Then I set up temp tables for the two tables and inserted distinct information. Then I ran the query against the temp tables. Took one minute and 4 seconds. Why couldn't mysql optimize this query thusly? – user151841 Feb 11 '10 at 19:41
  • 1
    To be honest, mysql is rather stupid and slow in some regards. Oracle, MS SQL and PostgreSQL do a lot better in many regards. Of course it will also be a lot faster if you add indices to the columns of the tables, if you don't have them already. although that will go at the cost of insert time since the indices need to be updated each time, it's a matter of how many times you do each operation and which is the more time critical. – wich Feb 12 '10 at 07:33
4

I had trouble figuring out the right way to execute this query, even with the answers provided; then I found the MySQL documentation reference I needed:

SELECT DISTINCT store_type
FROM stores 
WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);

The trick I had to wrap my brain around was using the reference to the 'stores' table from the first query inside the subquery. Hope this helps (or helps others, since this is an old thread.)

From http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html

FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202
Coty
  • 41
  • 1
0

As far as I know, NOT IN can only be used for 1 field at a time. And the field has to be specified in between "WHERE" and "NOT IN".

(Edit:) Try using a NOT EXISTS:

SELECT a, b, c 
FROM t1 
WHERE NOT EXISTS 
   (SELECT * 
   FROM t2 
   WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)

In addition, an inner join on a, b, and c being equal should give you all non-DISTINCT tuples, while a LEFT JOIN with a WHERE IS NULL clause should give you the DISTINCT ones, as Charles mentioned below.

froadie
  • 79,995
  • 75
  • 166
  • 235
  • Holy God, that would make a nightmare query. I only care about the values of a, b, or c depending on the values of the other two! – user151841 Feb 11 '10 at 18:03
  • How about using a join? Inner joining on all 3 fields should return all non-distinct tuples – froadie Feb 11 '10 at 18:05
  • Or how about using exists? SELECT a, b, c FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c). Not sure if this is exactly correct, I haven't had much experience with exists – froadie Feb 11 '10 at 18:12
0

SELECT DISTINCT t1.* FROM t1 LEFT JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE t2.a IS NULL

Charles
  • 90
  • 1
  • 6
  • This is a very bad idea, if the tables are big that left join will be painfully slow. You're doing alot of work, reading to both tables fully creating a left join result set, etc., which is simply unnecessary. – wich Feb 11 '10 at 18:35
0

Well, I'm going to answer my own question, in spite of all the great advice others gave.

Here's the proper syntax for what I was trying to do.

SELECT DISTINCT a, b, c FROM t1 WHERE (a,b,c) NOT IN ( SELECT DISTINCT a,b,c FROM t2 )

Can't vouch for the efficiency of it, but the broader questions I was implicitly putting was "How do I express this thought in SQL", not "How do I get a particular result set". I know that's unfair to everyone who took a stab, sorry!

user151841
  • 17,377
  • 29
  • 109
  • 171
  • 1
    Depending on the definition of columns a, b and c of t2 this could be wrong! If they allow for NULL values then the result for the NOT IN will always be unknown for any such value. NOT EXISTS *is* the right way to express this, NOT EXISTS was created for this. – wich Feb 12 '10 at 07:40
-1

Need to add a column list after the WHERE clause and REMOVE the alias.

I tested this with a similar table and it is working.

SELECT DISTINCT a, b, c 
FROM t1 WHERE (a,b,c)
NOT IN (SELECT DISTINCT a,b,c FROM t2)

Using the mysql world db:

-- dont include city 1, 2
SELECT DISTINCT id, name FROM city 
WHERE (id, name) 
NOT IN (SELECT id, name FROM city  WHERE ID IN (1,2))
Yada
  • 30,349
  • 24
  • 103
  • 144
  • It would be better to use NOT EXISTS, NOT IN will force a result set to be made for the sub query by reading the whole table, when you use NOT EXISTS no result set needs to be created for the sub query and if the columns are indexed, the NOT EXISTS will only read the indices. – wich Feb 11 '10 at 18:46
  • Are you sure about that? If the columns are not indexed NOT EXISTS is as slow NOT IN. – Yada Feb 11 '10 at 19:12