Your query would delete all rows from each set of dupes (as all share the same id
by which you select - that's what @wildplasser hinted at with subtle comments) and only initially unique rows would survive. So if it "deletes the whole table", that means there were no unique rows at all.
In your query, dupes are defined by (id)
alone, not by the whole row as your title suggests.
Either way, there is a remarkably simple solution:
DELETE FROM customer_info c
WHERE EXISTS (
SELECT FROM customer_info c1
WHERE ctid < c.ctid
AND c1 = c -- comparing whole rows
);
Since you deal with completely identical rows, the remaining way to tell them apart is the internal tuple ID ctid
.
My query deletes all rows, where an identical row with a smaller ctid
exists. Hence, only the "first" row from each set of dupes survives.
Notably, NULL
values compare equal in this case - which is most probably as desired. The manual:
The SQL specification requires row-wise comparison to return NULL if
the result depends on comparing two NULL values or a NULL and a
non-NULL. PostgreSQL does this only when comparing the results of two
row constructors (as in Section 9.23.5) or comparing a row constructor
to the output of a subquery (as in Section 9.22). In other contexts
where two composite-type values are compared, two NULL field values
are considered equal, [...]
If dupes are defined by id
alone (as your query suggests), then this would work:
DELETE FROM customer_info c
WHERE EXISTS (
SELECT FROM customer_info c1
WHERE ctid < c.ctid
AND id = c.id
);
But then there might be a better way to decide which rows to keep than ctid
as a measure of last resort!
Obviously, you would then add a PRIMARY KEY
to avoid the initial dilemma from reappearing. For the second interpretation, id
is the candidate.
Related:
About ctid
: