2

This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
tuskity
  • 41
  • 5
  • I'm thinking this question needs some more thought and information. I'm not seeing ```DELETE``` in the example. Furthermore you seem to be wanting to do an ```INSERT ``` or ```UPDATE```("What syntax would replace the current table with just the results of this one?") What is the current table? – Adrian Klaver Jul 20 '20 at 18:12
  • This is a good discussion of several different options in the question and answers if you ignore the "slowness" issue: https://stackoverflow.com/questions/47402098/postgresql-slow-delete-from-where-exists – Mike Organek Jul 20 '20 at 18:19

1 Answers1

1

Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

DELETE FROM people AS p
WHERE  EXISTS (
   SELECT FROM people AS p1
   WHERE  p1.firstname = p.firstname
   AND    p1.lastname  = p.lastname
   AND    p1.startdate > p.startdate
   );

It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

Can there be identical values in startdate? Then you need a tiebreaker ...

Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

If compared columns can be NULL, consider:

There is a whole dedicated tag for . Combine it with to narrow down:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Your method looks much more simple than what I was trying to do. The example code isn't my exact situation. I have a serial_no for each row that has duplicates. The compared date doesn't have nulls, but there can be ties. Can you elaborate on how I could include a tie breaker? – tuskity Jul 20 '20 at 22:45
  • @tuskity: Please clarify the question accordingly. Tell us what ties can happen and how you *want* to break ties, then I can tell you how to implement it. You might alternatively start a new question for this with details. – Erwin Brandstetter Jul 20 '20 at 22:50
  • I set up a new question for the topic here: https://stackoverflow.com/questions/63005307/how-to-break-ties-when-comparing-columns-in-sql – tuskity Jul 20 '20 at 23:43