11

Using MySQL 5.x I want to efficiently select all rows from table X where there is no related row in table Y satisfying some condition, e.g.

Give me all records in X where a related Y with foo = bar does NOT exist

SELECT count(id) FROM X
LEFT OUTER JOIN Y ON y.X_id = X.id AND y.foo = 'bar'
WHERE y....?

As I understand it, a left outer join is guaranteed to produce a row for each row in the left (first) table -- X in this case -- whether or not a satisfying row in the joined table was found. What I want to do is then select only those rows where no row was found.

It seems to me that y.X_id should be NULL if there is no matching record, but this test doesn't seem to work. Nor does y.X_id = 0 or !y.X_id.

Edits: corrected transcription error (ON not AS) which was pointed out by several responses. Fixed grammatical error.

podperson
  • 2,284
  • 2
  • 24
  • 24

4 Answers4

23
SELECT count(id) FROM X 
LEFT OUTER JOIN Y ON (y.X_id = X.id AND y.foo = 'bar')
WHERE y.X_id is null

You were close.

First do the join as normal, then select all rows for which a not null row in Y is in fact null, so you are sure there's a "no match" and not just a null value in Y.

Also note the typo (since corrected) you made in the query:

LEFT OUTER JOIN Y AS
-- should be
LEFT OUTER JOIN Y ON
-- This however is allowed
LEFT OUTER JOIN table2 as Y ON ....
podperson
  • 2,284
  • 2
  • 24
  • 24
Johan
  • 74,508
  • 24
  • 191
  • 319
  • Thanks. As I mention in my edit, the AS was a slip. (The actual query is, of course, a horrible mess so I was trying to simplify.) – podperson May 06 '11 at 15:08
4

Checking if the primary key of table Y is NULL would do the trick, which tells the join did not matched :

SELECT count(id) FROM X 
LEFT OUTER JOIN Y ON (y.X_id = X.id AND y.foo = 'bar')
WHERE y.Y_id is null
MaxiWheat
  • 6,133
  • 6
  • 47
  • 76
4

Johan's answer is correct 100%.

Besides that, there is also this option:

SELECT count(id)
FROM X 
WHERE NOT EXISTS
  ( SELECT * 
    FROM Y
    WHERE (y.X_id = X.id AND y.foo = 'bar')
  )

Depending on your table size and data distribution, this may be more efficient. Test and keep both ways for future reference.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • 1
    Can you elaborate on when NOT EXISTS is more efficient? Perhaps when table X is large and table Y is quite small? Just curious, thanks. – Jared Beck May 06 '11 at 04:34
  • 2
    The `LEFT JOIN / NOT NULL` solution and the `NOT EXISTS` one may result in different plans from the execution planner. This (may) depend on a lot of factors, like the DB system you use (mySQL or SQL-Server or other), the version of the system, the particular engine (InnoDB or MyISAM), the size of the tables, the distribution of the joined fields (like the percent of `X.id` that actually exists on `Y`), if there are `NULL`s in the fields or not, etc. So, I have a truly marvelous demonstration of this complex dependance which this margin is too narrow to contain. – ypercubeᵀᴹ May 06 '11 at 12:10
  • 1
    You can read this question where there's a lot of discussion on this matter and many links to material outside SO: http://stackoverflow.com/questions/2246772/whats-the-difference-between-not-exists-vs-not-in-vs-left-join-where-is-null – ypercubeᵀᴹ May 06 '11 at 12:13
  • Bottom line is that different plans will lead to (slightly or very) different execution times. Test with **your** data, in your machine, with various table sizes and settings use what's best in your case. (If course there is sign that the query is slow. If it works fast enough, no need to spend time optimizing). – ypercubeᵀᴹ May 06 '11 at 12:16
  • I tried NOT EXISTS (only after banging my head on the JOIN solution) but it exploded in my face (and I was pretty careful). At least in my experience I've found anything involving subqueries to be dramatically less performant than (careful) JOINs. But I'm working with MySQL5 so of course YMMV. Then again this Oracle guy suggests rewriting subqueries as joins to improve performance: http://www.dba-oracle.com/oracle_tips_subq_rewrite.htm – podperson May 06 '11 at 15:11
  • @podperson, Oracle advice does not work for MySQL, they are **very** different under the hood. Follow ypercube's advice, only testing will tell you (also MySQL is very different from version to version). – Johan May 06 '11 at 15:23
  • @podperson: *"exploded in my face"* as "it did not run giving error" or as "it run but gave not what I expected" or "it run correctly but it was devastating slow" ? – ypercubeᵀᴹ May 06 '11 at 15:46
  • @ypercube "it ran and brought the server to its knees". – podperson Sep 13 '11 at 14:40
  • @podperson: Can you supply the execution plan? (and the indexes on the 2 tables)? – ypercubeᵀᴹ Sep 13 '11 at 19:25
  • Been a while, but I can tell you all the fields relevant to the query were indexed. – podperson Jul 25 '19 at 20:23
-2

Why use an outer join? Couldn't you just do:

SELECT count(id) 
FROM X JOIN Y AS y.X_id = X.id AND y.foo <> 'bar'
StevieG
  • 8,639
  • 23
  • 31
  • That would select matches, OP is looking for "no match" – Johan May 04 '11 at 16:04
  • It would give him all rows from X where there is a matching row in Y which does not have y.foo = bar (notice the <> ). I've probably misunderstood the question... – StevieG May 04 '11 at 16:11
  • An `inner join` can **never** give nonmatching rows, it will just hide those rows from view, that's the problem. Notice how the outer join tests for both `y.X_id = X.id AND y.foo = 'bar'`, the exact opposite for that would be `y.X_id <> X.id OR y.foo <> 'bar'`, but that will result in a different outcome, with many matching Y rows per per X-row. – Johan May 04 '11 at 16:47
  • 1
    I did misunderstand the question.. My bad. – StevieG May 04 '11 at 16:56