100

I have a MySQL query that joins two tables

  • Voters
  • Households

They join on voters.household_id and household.id.

Now what I need to do is to modify it where the voter table is joined to a third table called elimination, along voter.id and elimination.voter_id. However the catch is that I want to exclude any records in the voter table that have a corresponding record in the elimination table.

How do I craft a query to do this?

This is my current query:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'
ORDER BY `Last_Name` ASC
LIMIT 30 
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
gsueagle2008
  • 4,583
  • 10
  • 36
  • 46

4 Answers4

246

I'd probably use a LEFT JOIN, which will return rows even if there's no match, and then you can select only the rows with no match by checking for NULLs.

So, something like:

SELECT V.*
FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id
WHERE E.voter_id IS NULL

Whether that's more or less efficient than using a subquery depends on optimization, indexes, whether its possible to have more than one elimination per voter, etc.

Yves M.
  • 29,855
  • 23
  • 108
  • 144
NickZoic
  • 7,575
  • 3
  • 25
  • 18
  • 3
    +1 much faster on high-load then sub-queries + if U can do JOINs instead of sub-queries - just do JOINs they are much simpler for analizer. Another useful example, U may want to get result if there are some rows in a right table or if there are no one: `SELECT V.* FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id OR E.voter_id IS NULL` ex.: if U don't want to store all records in the right table for every row from the left. – Arthur Kushman Jul 04 '16 at 05:45
  • 1
    How would you modify this query to find rows that do not exist in `E`, when `E.voter_id` can be `NULL` in the dataset we're `JOIN`ing from? – Danny Beckett Jun 19 '18 at 17:36
  • You need to link the tables together with some common column or related value. But I think this could work (untested): `SELECT V.*, COUNT(E.*) AS \`countE\` FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id WHERE countE = 0;` – ericek111 Jun 29 '20 at 22:29
11

I'd use a 'where not exists' -- exactly as you suggest in your title:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'

AND NOT EXISTS (
  SELECT * FROM `elimination`
   WHERE `elimination`.`voter_id` = `voter`.`ID`
)

ORDER BY `Last_Name` ASC
LIMIT 30

That may be marginally faster than doing a left join (of course, depending on your indexes, cardinality of your tables, etc), and is almost certainly much faster than using IN.

Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
9

There are three possible ways to do that.

  1. Option

    SELECT  lt.* FROM    table_left lt
    LEFT JOIN
        table_right rt
    ON      rt.value = lt.value
    WHERE   rt.value IS NULL
    
  2. Option

    SELECT  lt.* FROM    table_left lt
    WHERE   lt.value NOT IN
    (
    SELECT  value
    FROM    table_right rt
    )
    
  3. Option

    SELECT  lt.* FROM    table_left lt
    WHERE   NOT EXISTS
    (
    SELECT  NULL
    FROM    table_right rt
    WHERE   rt.value = lt.value
    )
    
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Dumindu Madushanka
  • 494
  • 1
  • 9
  • 19
-1

Be wary of "LEFT" JOINS - LEFT JOINS are essentially OUTER JOINS. Different RDBMS query parsers and optimizers may handle OUTER JOINS very differently. Take for instance, how LEFT (OUTER) JOINS are parsed by MySQL's query optimizer, and the difference in resulting execution plans they could evaluate to per iteration:

https://dev.mysql.com/doc/refman/8.0/en/outer-join-simplification.html

LEFT JOINS by their very nature are ALWAYS going to be NonDeterministic. IMO - they should not be used in Production code.

I prefer to write JOIN type statements in a more "old school" approach first, leaving out any specific JOIN declarations. Let the RDBMS query parser do what its designed to do - analyze your statement and translate it to most optimal execution plan based on its evaluation of your index stats and data model design. That said, the build in query parsers / optimizers can even get it wrong, trust me I've seen it happen many times. In general, I feel like taking this approach first generally provides sufficient baseline information to make informed further tuning decisions in most cases.

To illustrate - using the question query from this thread:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'

AND NOT EXISTS (
  SELECT * FROM `elimination`
   WHERE `elimination`.`voter_id` = `voter`.`ID`
)

ORDER BY `Last_Name` ASC
LIMIT 30

Consider it re-written without the explicit JOIN and NOT EXISTS statements above (assumes the non fully qualified fields in the WHERE clause belonged to the voter table):

SELECT v.`ID`, v.`Last_Name`, v.`First_Name`,
       v.`Middle_Name`, v.`Age`, v.`Sex`,
       v.`Party`, v.`Demo`, v.`PV`,
       h.`Address`, h.`City`, h.`Zip`
FROM `voter` v, `household` h, `elimination` e
WHERE v.`House_ID` = h.`id`
AND v.`ID` != e.`voter_id`
AND v.`CT` = '5'
AND v.`Precnum` = 'CTY3'
AND  v.`Last_Name`  LIKE '%Cumbee%'
AND  v.`First_Name`  LIKE '%John%'
ORDER BY v.`Last_Name` ASC
LIMIT 30;

Try writing some of your future SQL queries BOTH ways syntactically going forward, compare their results, and see what you think. Writing your SQL in the style I have suggested above comes with the added benefit of being more RDBMS agnostic, also.

Cheers!

  • 1
    Very long , keep it crisp and short ! – sidverma Jul 02 '21 at 12:47
  • Thanks for the advice - much appreciated. For all the years I've read through countless threads searching for answers myself - I've decided its my time to start giving back. I'll make sure to keep it concise from now on. – Christopher Bishop Jul 03 '21 at 14:29