Find unmatched rows between two tables

Question

Given this setup:

CREATE TABLE table1 (column1 text, column2 text);
CREATE TABLE table2 (column1 text, column2 text);

INSERT INTO table1 VALUES
   ('A', 'A')
 , ('B', 'N')
 , ('C', 'C')
 , ('B', 'A');

INSERT INTO table2 VALUES
   ('A', 'A')
 , ('B', 'N')
 , ('C', 'X')
 , ('B', 'Y');

How can I find missing combinations of (column1, column2) between these two tables? Rows not matched in the other table.

The desired result for the given example would be:

  C | C
  B | A
  C | X
  B | Y

There can be duplicate entries so we'd want to omit those.

For an unambiguous answer you should declare whether there can be NULL values and, if so, how to deal with those. — Erwin Brandstetter, Sep 29 '18 at 00:04

Erwin Brandstetter · Answer 1 · 2018-09-29T00:21:14.167

The devil is in the details with this seemingly simple task.

Short and among the fastest:

SELECT col1, col2
FROM        (SELECT col1, col2, TRUE AS x1 FROM t1) t1
FULL   JOIN (SELECT col1, col2, TRUE AS x2 FROM t2) t2 USING (col1, col2)
WHERE  (x1 AND x2) IS NULL;

The FULL [OUTER] JOIN includes all rows from both sides, but fills in NULL values for columns of missing rows. The WHERE conditions (x1 AND x2) IS NULL identifies these unmatched rows. Equivalent: WHERE x1 IS NULL OR x2 IS NULL.

To eliminate duplicate pairs, add DISTINCT (or GROUP BY) at the end - cheaper for few dupes:

SELECT DISTINCT col1, col2
FROM ...

If you have many dupes on either side, it's cheaper to fold before the join:

SELECT col1, col2
FROM        (SELECT DISTINCT col1, col2, TRUE AS x1 FROM t1) t1
FULL   JOIN (SELECT DISTINCT col1, col2, TRUE AS x2 FROM t2) t2 USING (col1, col2)
WHERE  (x1 AND x2) IS NULL;

It's more complicated if there can be NULL values. DISTINCT / DISTINCT ON or GROUP BY treat them as equal (so dupes with NULL values are folded in the subqueries above). But JOIN or WHERE conditions must evaluate to TRUE for rows to pass. NULL values are not considered equal in this, the FULL [OUTER] JOIN never finds a match for pairs containing NULL. This may or may not be desirable. You just have to be aware of the difference and define your requirements.

Consider the added demo in the SQL Fiddle

If there are no NULL values, no duplicates, but an additional column defined NOT NULL in each table, like the primary key, let's name each id, then it can be as simple as:

SELECT col1, col2
FROM   t1
FULL   JOIN t2 USING (col1, col2)
WHERE  t1.id IS NULL OR t2.id IS NULL;

D-Shih · Answer 2 · 2018-09-28T20:57:21.503

1

You can try to use not exists with a subquery, then use UNION ALL

select Column1,Column2   from table1 t1 
where NOT exists 
(
    select 1 
    FROM table2 t2
    where t1.Column1 = t2.Column1 or t1.Column2 = t2.Column2
)
UNION ALL
select Column1,Column2  from table2 t1 
where NOT exists 
(
    select 1 
    FROM table1 t2
    where t1.Column1 = t2.Column1 or t1.Column2 = t2.Column2
)

edited Sep 28 '18 at 20:57

answered Sep 28 '18 at 20:51

D-Shih

44,943
6
31
51

score 1 · Accepted Answer · answered Sep 28 '18 at 20:54

One method is union all:

select t1.col1, t1.col2
from t1
where (t1.col1, t1.col2) not in (select t2.col1, t2.col2 from t2)
union all
select t2.col1, t2.col2
from t2
where (t2.col1, t2.col2) not in (select t1.col1, t1.col2 from t1);

If there are duplicates within a table, you can remove them by using select distinct. There is no danger of duplicates between the tables.

score 1 · Answer 4 · edited Sep 28 '18 at 21:06

You can try set operations. EXCEPT to find the rows in table but not in the other and UNION to put the partial results into one.

(SELECT column1,
        column2
        FROM table1
 EXCEPT
 SELECT column1,
        column2
        FROM table2)
UNION
(SELECT column1,
        column2
        FROM table2
 EXCEPT
 SELECT column1,
        column2
        FROM table1);

If you don't need duplicate elimination you can try to use the ALL variants (EXCEPT ALL and UNION ALL). They are generally faster, as the DBMS doesn't have to look for and eliminate duplicates.

score 1 · Answer 5 · answered Sep 28 '18 at 20:55

Seems to be a perfect task for set operations:

  ( --all rows from table 1 missing in table 2
    select *
    from table1
    except 
    select *
    from table2
  )
  union all -- both select return distinct rows
  ( -- all rows in table 2 missing in table 1
    select *
    from table2
    except 
    select *
    from table1
  )

Find unmatched rows between two tables

5 Answers5