Can I delete rows/replace table based on temp table

Question

This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.

I'm thinking this question needs some more thought and information. I'm not seeing ```DELETE``` in the example. Furthermore you seem to be wanting to do an ```INSERT ``` or ```UPDATE```("What syntax would replace the current table with just the results of this one?") What is the current table? — Adrian Klaver, Jul 20 '20 at 18:12
This is a good discussion of several different options in the question and answers if you ignore the "slowness" issue: https://stackoverflow.com/questions/47402098/postgresql-slow-delete-from-where-exists — Mike Organek, Jul 20 '20 at 18:19

Erwin Brandstetter · Accepted Answer · 2020-07-20T22:10:25.863

Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

DELETE FROM people AS p
WHERE  EXISTS (
   SELECT FROM people AS p1
   WHERE  p1.firstname = p.firstname
   AND    p1.lastname  = p.lastname
   AND    p1.startdate > p.startdate
   );

It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

Can there be identical values in startdate? Then you need a tiebreaker ...

Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

If compared columns can be NULL, consider:

How to delete duplicate rows without unique identifier

There is a whole dedicated tag for duplicate-removal. Combine it with postgres to narrow down:

https://stackoverflow.com/questions/tagged/duplicates+postgresql

Your method looks much more simple than what I was trying to do. The example code isn't my exact situation. I have a serial_no for each row that has duplicates. The compared date doesn't have nulls, but there can be ties. Can you elaborate on how I could include a tie breaker? — tuskity, Jul 20 '20 at 22:45
@tuskity: Please clarify the question accordingly. Tell us what ties can happen and how you *want* to break ties, then I can tell you how to implement it. You might alternatively start a new question for this with details. — Erwin Brandstetter, Jul 20 '20 at 22:50
I set up a new question for the topic here: https://stackoverflow.com/questions/63005307/how-to-break-ties-when-comparing-columns-in-sql — tuskity, Jul 20 '20 at 23:43

Can I delete rows/replace table based on temp table

1 Answers1