1

I have a query that I want to add some log to, to drop results that successfully match when I add one more table to the JOIN.

I'm accomplishing this now with an additional WHERE IN statement instead:

SELECT blah blah FROM donation
WHERE donation.id NOT IN (SELECT donation_id FROM donation_relation)

I just worry that selecting all ID fields from this donation_relation table in the subquery will begin dragging when the table starts growing. What's the more efficient way (if it exists) to use JOIN to accomplish this exclusion? The two tables are joinable on donation.id and donation_relation.donation_id

Thanks!

4 Answers4

2

The common way in Standard SQL uses correlated NOT EXISTS (additionally NOT IN might have some non-intuitive side-effects when NULLs are involved):

SELECT blah blah FROM donation as d
WHERE  NOT EXISTS
   (SELECT * FROM donation_relation as dr 
    where dr.donation_id = d.donation_id)
dnoeth
  • 59,503
  • 4
  • 39
  • 56
0

LEFT OUTER JOIN alternative:

SELECT blah blah
FROM donation
  LEFT JOIN donation_relation ON donation.id = donation_relation.donation_id 
WHERE donation_relation.donation_id IS NULL;

Probably faster, especially when MySQL.

EXISTS version:

SELECT blah blah FROM donation
WHERE NOT EXISTS (SELECT donation_id FROM donation_relation
                  WHERE donation.id = donation_relation.donation_id )
jarlh
  • 42,561
  • 8
  • 45
  • 63
  • Thanks !!! I wound up using LEFT JOIN as suggested here - but huge kudos & thanks to every other answerer, too! – user3550402 May 16 '15 at 17:31
0

Exactly JOIN is faster then sub query in RDBMS. This Join vs. sub-query more explain about this. Use following query:

SELECT blah blah 
FROM   donation 
       LEFT JOIN donation_relation 
              ON donation.id = donation_relation.donation_id 
WHERE  donation_relation.donation_id IS NULL 
Community
  • 1
  • 1
Manwal
  • 23,450
  • 12
  • 63
  • 93
  • Nope, JOIN is not generally faster, might be true in some DBMSes for some set of data. There's a high probablity that NOT EXISTS is more efficient. The "correct" answer of your link is not correct for decent optimizers as they can rewrite a subquery to a join anyway. – dnoeth May 16 '15 at 12:34
  • @dnoeth what these answers http://stackoverflow.com/a/2577224/2236219 http://stackoverflow.com/a/2577188/2236219 explain then? – Manwal May 16 '15 at 12:38
  • 1
    It's an *opinion* which might be true for a specific DBMS, but it's not backed up by any proof and can't be generalized. Simply read the comments. And the 2nd link also doesn't claim it's always faster. Both answers are five years old and there is constant evolution in optimizer algorithms. – dnoeth May 16 '15 at 12:43
0

You can use NOT EXISTS (anti-join):

SELECT blah blah FROM donation
WHERE NOT EXISTS (
    SELECT 1 FROM donation_relation
    WHERE donation_relation.donation_id = donation.id
)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
Mick Mnemonic
  • 7,808
  • 2
  • 26
  • 30