0

I need to do whast I describe as a "reverse join" -- that is, I need to create a list of rows in table A that are NOT in table B, where A.FNAME like B.FNAME and A.LNAME like B.LNAME. The point is to extract mailing list entries that were overlooked in a prior mailing. Unfortunately the primary keys are not consistent between the two tables. Table B failed to copy them from Table A when it was created originally, due to an inexperienced operator.

None of the search responses returned for "reverse join" seem to address this situation; am I trying to "prove a negative" here? I thought I could create a temporary table of the full mailing list, then DELETE from it each row that had a matching FNAME and LNAME in the shorter list, but I do not seem to be able to do this without destroying the shorter list in the process.

Jim Kyle
  • 31
  • 7
  • Search for "**anti-join**". See the pattern in the answer from Uueerdo... an outer join, to return all rows from the table on the left side, along with matching rows from the table on the right, and a twist in the WHERE clause to exclude rows that had at least one match from the right, leaving only rows from the table on the left side that don't have a match. – spencer7593 Mar 26 '18 at 22:25

2 Answers2

0

No join needed:

SELECT * 
FROM tableA
WHERE (fname, lname) NOT IN (SELECT fname, lname FROM tableB)

but can be done with one:

SELECT a.*
FROM tableA AS s
LEFT JOIN tableB as b ON a.fname = b.fname AND a.lname = b.lname
WHERE b.somefield_that_cannot_be_null IS NULL

The second version would normally be my first instinct (I have better luck with indexes when joining) but since the fields involved are unlikely to be indexed, and string comparison is relatively costly anyway, I think the intent of the first version is clearer.

Note: LIKE is unnecessary unless you are expecting wildcards (%) in the field values... in which case the join version is your only option from these two.

Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • I refer to that second pattern as an "anti-join" pattern, and I tend to prefer that. With the `NOT IN` it is imperative that the SELECT not return any NULL values. If we have a NOT NULL guarantee on both columns `fname` and `lname`, we're fine. Otherwise, in the more general case, we want to exclude rows with NULL values. To do that, we could include `WHERE fname IS NOT NULL AND lname IS NOT NULL` as part of the NOT IN suquery. – spencer7593 Mar 26 '18 at 22:22
0

As Uueerdo demonstrated, use an anti-join pattern.

SELECT a.*
  FROM `table A` a

LEFT
 JOIN `table B` b
   ON b.fname = a.fname 
  AND b.lname = a.lname

WHERE b.fname IS NULL 

Let's break that down a bit.

It's an outer join, to return all rows from A, along with matching rows from B, with a twist, the condition in the WHERE clause.

Any rows where we found a match in table B, we are guaranteed the both b.fname and b.lname will be non-null. (If either of the columns had been NULL, the equality condition in the join predicate wouldn't have been satisfied, and the row from B wouldn't have been matched.

Any rows from A that didn't have a match found in B will have NULL values for all of the columns in B. If we exclude all rows that have a non-NULL value in any column from B that is guaranteed to be non-NULL, what we are left with is rows from A that didn't have a match in B.

The anti-join is a very useful pattern to have in our SQL toolbelts.


An equivalent result can be obtained using a NOT EXISTS pattern. (See answer from Nick for a demonstration of that.)


Curiously, in MySQL 5.6, with appropriate index available, for the "anti-join" query pattern, the EXPLAIN output actually shows "not exists".

-- explain antijoin
id  select_type        table type key  rows    Extra        
--  ------------------ ----- ---- ---- ------  ------------------------------------
 1  SIMPLE             a     ALL       210339 
 1  SIMPLE             b     ref  FK_a      1  Using where; Not exists; Using index

For an equivalent "not exists" query, the EXPLAIN output is slightly different, showing a dependent subquery.

-- explain "not exists" equivalent
id  select_type        table type key  rows    Extra        
--  ------------------ ----- ---- ---- ------  ------------------------------------
 1  PRIMARY            a     ALL       210339  Using where
 2  DEPENDENT SUBQUERY b     ref  FK_a      1  Using index

Personally, I'd opt for the anti-join, because I tend to see the best performance with that. (It's possible that the NOT EXISTS pattern will give better performance, less likely we will see better performance with a NOT IN (subquery).

spencer7593
  • 106,611
  • 15
  • 112
  • 140