Get rows from two tables with multiple matches in the other

Question

I have two all the similar fields tables:

table_1: 
field_1, field_2, field_3, field_4
 

table_2: 
field_1, field_2, field_3, field_4

Here field_1 can be used as foreign key to join both tables.

I would like to get all the rows from table_1 and table_2 that have at least one row in table_1 but more than one in table_2, or vice versa.

So far I have tried these related solutions:

Similar fields or exactly the same fields? How many duplicates on average, roughly? And your Postgres version? — Erwin Brandstetter, Sep 30 '21 at 22:45

Erwin Brandstetter · Answer 1 · 2021-09-30T23:18:45.707

Assuming both tables have the same row type: all the same column names and types (at least compatible), you can work with row types to simplify:

SELECT (t).*
FROM  (SELECT t, count(*) AS ct1 FROM table_1 t GROUP BY 1) t1
JOIN  (SELECT t, count(*) AS ct2 FROM table_2 t GROUP BY 1) t2 USING (t)
WHERE t1.ct1 > 1
   OR t2.ct2 > 1;

Group duplicates and remember the count in each table.
Join the two tables, which removes all rows without match in the other table.
Filter rows where at least one side has more than one copy.
In the outer SELECT decompose the row type to get columns as usual.

I don't return row counts. If you need those, add ct1 and ct2 in the outer SELECT.

This requires every column type to support the equality operator =.
A prominent example that does not is json. (But jsonb does.) See:

How to query a json column for empty objects?

If you have such columns, cast to text to work around it. Or you can work with hash values - which also helps performance for very wide rows and/or many duplicates. Related:

Why doesn't my UNIQUE constraint trigger?

score 1 · Accepted Answer · answered Sep 30 '21 at 22:13

1

One way of getting all records from table_1 which have more than one matching record in table_2 is to count the number of matching records in a subquery, and put a condition on it:

SELECT * 
FROM table_1 t1 
WHERE (SELECT count(*) 
       FROM table_2 t2 
       WHERE t1.field_1 = t2.field_1) > 1

If you're looking to have both sides of this in one query, you can combine them with a UNION:

SELECT * 
FROM table_1 t1 
WHERE (SELECT count(*) 
       FROM table_2 t2 
       WHERE t1.field_1 = t2.field_1) > 1
UNION
SELECT * 
FROM table_2 t2 
WHERE (SELECT count(*) 
       FROM table_1 t1 
       WHERE t1.field_1 = t2.field_1) > 1

answered Sep 30 '21 at 22:13

Blue Star

1,932
1
10
11

This returns duplicates where there is more than one copy in both tables. Also, executing a correlated subquery for every row is pretty expensive for bigger tables. – Erwin Brandstetter Sep 30 '21 at 23:07
The language used in the question suggested to me that returning duplicates is the desired behaviour, but your answer is definitely better than mine. – Blue Star Sep 30 '21 at 23:15
Then I got it backwards. (But it would make more sense to just report a row count in this case ...) – Erwin Brandstetter Sep 30 '21 at 23:16

Get rows from two tables with multiple matches in the other

2 Answers2