0

How to delete duplicate data on a table which have kind data like these. I want to keep it with the latest updated_at at each attribute id.

Like as follows:

attribute id | created at          | product_id
1            | 2020-04-28 15:31:11 | 112235
4            | 2020-04-28 15:30:25 | 112235
1            | 2020-04-29 15:30:25 | 112236
4            | 2020-04-29 15:30:25 | 112236
  • https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/ – Yak O'Poe May 03 '20 at 09:16
  • Looks a bit like this: https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql – sequoia May 03 '20 at 11:32
  • it's good using this link, but why if i have another product_id then it'll deleted. – Deniel Lin May 03 '20 at 11:44
  • 1
    Unrelated to your problem, but: Postgres 9.3 is [no longer supported](https://www.postgresql.org/support/versioning/) you should plan an upgrade as soon as possible. –  May 04 '20 at 06:54

1 Answers1

2

You can use an EXISTS condition.

delete from the_table t1
where exists (select *
              from the_table t2
              where t2.created_at > t1.created_at
                and t2.attribute_id = t1.attribute_id);

This will delete all rows where another row for the same attribute_id exists that has bigger created_at value (thus keeping only the row with the highest created_at for each attribute_id). Note that if two created_at values are identical, nothing will be deleted for that attribute_id

Online example