Assuming all columns NOT NULL
, this will do:
SELECT *
FROM tbl t1
WHERE EXISTS (
SELECT FROM tbl t2
WHERE (t1.*) = (t2.*)
AND t1.ctid <> t2.ctid
);
ctid
is a system column, the "tuple identifier" / "item pointer" that can serve as poor-man's PK in the absence of an actual PK (which you obviously don't have), and only within the scope of a single query. Related:
If columns can be NULL
, (more expensively) operate with IS NOT DISTINCT FROM
instead of =
. See:
(t1.*) = (t2.*)
is comparing ROW values. This shorter syntax is equivalent: t1 = t2
unless a column of the same name exists in the underlying tables, in which case the second form fails while the first won't. See:
Index?
If any of the columns has a particularly high cardinality (many distinct values, few duplicates), let's call it hi_cardi_column
for the purpose of this answer, a plain btree index on just that column can be efficient for your task. A combination of a few, small columns with a multicolumn index can work, too. The point is to have a small, fast index or the overhead won't pay.
SELECT *
FROM tbl t1
WHERE EXISTS (
SELECT FROM tbl t2
WHERE t1.hi_cardi_column = t2.hi_cardi_column -- logically redundant
AND (t1.*) = (t2.*)
AND t1.ctid <> t2.ctid
);
The added condition t1.hi_cardi_column = t2.hi_cardi_column
is logically redundant, but helps to utilize said index.
Other than that I don't see much potential for index support as all rows of the table have to be visited anyway, and all columns have to be checked.