DELETE FROM case_file AS p
WHERE id NOT IN (
SELECT DISTINCT ON (serial_no) id -- id = PK
FROM case_file
ORDER BY serial_no, cfh_status_dt DESC, registration_no
);
This keeps the (one) latest row per serial_no
, choosing the smallest registration_no
if there are multiple candidates.
NULL
sorts last in default ascending order. So any row with a not-null registration_no
is preferred.
If you want the greatest registration_no
instead, to still sort NULL values last, use:
...
ORDER BY serial_no, cfh_status_dt DESC, registration_no DESC NULLS LAST
See:
If you have no PK (PRIMARY KEY
) or other UNIQUE NOT NULL
(combination of) column(s) you can use for this purpose, you can fall back to ctid
. See:
NOT IN
is typically not the most efficient way. But this deals with duplicates involving NULL values. See:
If there are many duplicates - and you can afford to do so! - it can be (much) more efficient to create a new, pristine table of survivors and replace the old table, instead of deleting the majority of rows in the existing table.
Or create a temporary table of survivors, truncate the old and insert from the temp table. This way depending objects like views or FK constraints can stay in place. See:
Surviving rows are simply:
SELECT DISTINCT ON (serial_no) *
FROM case_file
ORDER BY serial_no, cfh_status_dt DESC, registration_no;