1

I'm new to SQL in general and I need to delete all duplicates in a given database.

For the moment, I use this DB to experiment some things.

The table currently looks like this :

table content

I know I can find all duplicates using this query :

SELECT COUNT(*) AS NBR_DOUBLES, Name, Owner
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1

but I have a lot of trouble finding an adapted and updated solution to not only find all the duplicates, but also delete them all, only leaving one of each.

Thanks a lot for taking some of your time to help me.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ilomax
  • 568
  • 1
  • 4
  • 24
  • 5
    You get a pass because you're new, but for future reference you should know it's considered very rude here to post an image of your sample data. Posting the sample data as text is easier for us to read, and it's better for you because it makes it easier for us to help you. – Joel Coehoorn Aug 01 '17 at 14:28
  • look here for more info on how to improve question:https://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/ – TheGameiswar Aug 01 '17 at 14:34
  • @JoelCoehoorn I'll take this into account when posting in the future, I'm sorry for the inconvenience. – ilomax Aug 01 '17 at 14:37
  • As for the reasons why posting data or code as picture is often discouraged, [here's a post on the meta](https://meta.stackoverflow.com/a/285557/4003419) that sums them up nicely. – LukStorms Aug 01 '17 at 15:13
  • Possible duplicate of [Finding duplicate rows in SQL Server](https://stackoverflow.com/questions/2112618/finding-duplicate-rows-in-sql-server) – Bartosz X Aug 01 '17 at 15:24

4 Answers4

1
;WITH numbered AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY Name, Owner ORDER BY Name, Owner) AS _dupe_num 
    FROM dbo.Animals
)
DELETE FROM numbered WHERE _dupe_num > 1;

This will delete all but one of each occurance with the same Name & Owner, if you need it to be more specific you should extend the PARTITION BY clause. If you want it to take in account the entire record you should add all your fields.

The record left behind is currently random, since it seems you do not have any field to have any sort of ordering on.

Jens
  • 3,249
  • 2
  • 25
  • 42
  • Solution is untested, but should work. Run in a transaction before doing anything rash. – Jens Aug 01 '17 at 14:32
1

What you want to do is use a projection that numbers each record within a given duplicate set. You can do that with a Windowing Function, like this:

SELECT Name, Owner
   ,Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
ORDER BY Name, Owner

This should give you results like this:

Name             Owner       RowNum
Ecstasy          Sacha         1
Ecstasy          Sacha         2
Ecstasy          Sacha         3
Gremlin          Max           1
Gremlin          Max           2
Gremlin          Max           3
Outch            Max           1
Outch            Max           2
Outch            Max           3

Now you want to convert this to a DELETE statement that has a WHERE clause targeting rows with RowNum > 1. The way to use a windowing function with a DELETE is to first include the windowing function as part of a common table expression (CTE), like this:

WITH dupes AS
(
    SELECT Name, Owner, 
       Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
   FROM dbo.animals
)
DELETE FROM dupes WHERE RowNum > 1;

This will delete later duplicates, but leave row #1 for each group intact. The only trick now is to make sure row #1 is the correct row, since not all of your duplicates have the same values for the Birth or Death columns. This is the reason I included the Birth column in the windowing function, while other answers (so far) have not. You need to decide if you want to keep the oldest animal or the youngest, and optionally change the Birth order in the OVER clause to match your needs.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
0

Use CTE. I will show you a sample :

Create table #Table1(Field1 varchar(100));

Insert into #Table1 values
('a'),('b'),('f'),('g'),('a'),('b');

Select * from #Table1;


WITH CTE AS(
   SELECT Field1,
       RN = ROW_NUMBER()OVER(PARTITION BY Field1 ORDER BY Field1)
   FROM #Table1
)
--SELECT * FROM CTE WHERE RN > 1
DELETE FROM CTE WHERE RN > 1

What I am doing is, numbering the rows. If there are duplicates based on PARTITION BY columns, it will be numbered sequentially, else 1.

Then delete those records whose count is greater than 1.

I won't spoon feed you solution hence you will have to play with PARTITION BY to reach your output

output :

Select * from #Table1;
Field1
---------
a
b
f
g
a
b

/*with cte as (...) SELECT * FROM CTE;*/

Field1  RN   
------- -----
a       1
a       2
b       1
b       2
f       1
g       1
Prabhat G
  • 2,974
  • 1
  • 22
  • 31
0

if NBR_DOUBLES had an ID field, I believe you could use this;

DELETE FROM NBR_DOUBLES WHERE ID IN
(
SELECT MAX(ID)
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
)
pi511
  • 27
  • 6