6

I've got a table that has rows that are unique except for one value in one column (let's call it 'Name'). Another column is 'Date' which is the date it was added to the database.

What I want to do is find the duplicate values in 'Name', and then delete the ones with the oldest dates in 'Date', leaving the most recent one.

Seems like a relatively easy query, but I know very little about SQL apart from simple queries.

Any ideas?

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
The KZA
  • 179
  • 1
  • 7

4 Answers4

6

Find duplicates and delete oldest one

alt text

Here is the Code

create table #Product (
    ID      int identity(1, 1) primary key,
    Name        varchar(800),
    DateAdded   datetime default getdate()
)

insert  #Product(Name) select 'Chocolate'
insert  #Product(Name,DateAdded) select 'Candy', GETDATE() + 1
insert  #Product(Name,DateAdded) select 'Chocolate', GETDATE() + 5
select * from #Product

;with Ranked as (
    select  ID, 
        dense_rank() 
        over (partition by Name order by DateAdded desc) as DupeCount
    from    #Product P
)
delete  R
from    Ranked R
where   R.DupeCount > 1

select * from #Product
Community
  • 1
  • 1
dance2die
  • 35,807
  • 39
  • 131
  • 194
  • This looks great, thank you very, very much, however it errors saying the DELETE statement conflicted with the REFERENCE constraint in another table, which I guess opens up another set of problems for me. I can however manually right click and delete the rows? – The KZA Mar 25 '09 at 05:36
  • If the row you are deleting is being referenced from another table, delete the row from that table first; using the ID retrieved from "Ranked" CTE in my above example. – dance2die Mar 25 '09 at 11:34
5

delete from table a1 where exists (select * from table a2 where a2.name = a1.name and a2.date > a1.date)

Lloyd
  • 51
  • 1
  • Running this gives me "Incorrect syntax near 'a1'." Is a1 a placeholder for something else? Sorry if that's a dumb question :) – The KZA Mar 25 '09 at 02:17
  • The problem is likely that "table" in the query should really be the name of the table containing the Name and Date columns, not literally the word "table", which isn't allowed in this context because it's a SQL keyword. – yukondude Mar 25 '09 at 02:59
0

I Just googled up and found this https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/

this one seems the easiest to read/understand to me:

DELETE FROM [SampleDB].[dbo].[Employee]
    WHERE ID NOT IN
    (
        SELECT MAX(ID) AS MaxRecordID
        FROM [SampleDB].[dbo].[Employee]
        GROUP BY [FirstName], 
                 [LastName], 
                 [Country]
    );

in your scenario you can just group by name and select the max date instead of Id

RagnaRock
  • 2,432
  • 7
  • 34
  • 56
  • He is looking to do it by date column, not id. – Mohy Eldeen Mar 09 '21 at 16:31
  • @MohyEldeen its the same concept he just needs to change the column ID by his DateAdded column (or whatever whe just wants to have the latest one). Not a reason to downvote this answer... – RagnaRock Mar 23 '21 at 11:59
0

You could probably achieve this with a self-join and a IS NOT NULL.

Joining on DELETE queries can be a little dangerous, because the more complex it is the more the risk of deleting more than you intend to in some circumstances.

But I would approach it like.

DELETE
  a.*
FROM
  mytable AS a
  LEFT JOIN mytable AS b ON
    b.date > a.date
    AND (b.name=a.name OR (b.date = a.date AND b.rowid>a.rowid))
WHERE
  AND b.rowid IS NOT NULL

The join and the IS NOT NULL finds every row for which there exists a newer row with the same name. It also handles the case of two rows with the same date correctly - if they have the same date, then it goes by rowid (whatever that is).

Hopefully something like this works.

thomasrutter
  • 114,488
  • 30
  • 148
  • 167