Simply delete duplicate content in a sql table

Question

I wanted to know if there is an easy way to remove duplicates from a table sql.

Rather than fetch the whole table and delete the data if they appear twice.

Thank you in advance

This is my structure :

CREATE TABLE IF NOT EXISTS `mups` (
  `idgroupe` varchar(15) NOT NULL,
  `fan` bigint(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

What RDBMS you are using? and please add the table structure and some sample data. — Mahmoud Gamal, Jan 10 '13 at 14:27
You would also need to define what is a duplicate. Is the entire row duplicated? Is it a duplication in a specific column or set of columns? — mellamokb, Jan 10 '13 at 14:28
There are tons of similar questions here.[Just one of them.](http://stackoverflow.com/questions/529098/removing-duplicate-rows-from-table-in-oracle) — bonsvr, Jan 10 '13 at 14:31
Once you've removed duplicates, implement a `UNIQUE` constraint on your table (across whatever columns should be considered when deciding if two rows are duplicates) so that you don't have to do this repeatedly. — Damien_The_Unbeliever, Jan 10 '13 at 14:58
possible duplicate of [Deleting duplicate rows from a table](http://stackoverflow.com/questions/1043488/deleting-duplicate-rows-from-a-table) — Andriy M, Jan 10 '13 at 15:01

score 1 · Answer 1 · answered Jan 10 '13 at 14:30

1

If you are using Sql Server

Check this: SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

Sample Code using CTE:

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

answered Jan 10 '13 at 14:30

Kapil Khandelwal

15,958
2
45
52

this does not work ! WITH mups (idgroupe,fan, DuplicateCount) AS ( SELECT idgroupe,fan, ROW_NUMBER() OVER(PARTITION BY idgroupe,fan ORDER BY idgroupe) AS DuplicateCount FROM mups ) DELETE FROM mups WHERE DuplicateCount > 1 – SoCkEt7 Jan 10 '13 at 15:07

score 0 · Answer 2 · answered Jan 10 '13 at 14:29

0

Add a calculated column that takes the checksum of the entire row. Search for any duplicate checksums, rank and remove the duplicates.

answered Jan 10 '13 at 14:29

Vinnie

3,889
1
26
29

Checksums can generate false positives. If the checksum is, say, 32-bits wide, then you only need ~80000 different rows for there to be >50% chance of two checksums being the same. – Damien_The_Unbeliever Jan 10 '13 at 14:56

score 0 · Answer 3 · answered Jan 10 '13 at 14:31

you can do something like this :

DELETE from yourTable WHERE tableID in 
(SELECT clone.tableID 
 from yourTable origine,
  yourTable clone 
 where clone.tableID= origine.tableID)

But in the WHERE, you can either compare the indexes or compare each other fields...

depending on how you find your doubles.

note, this solution has the advantage of letting you choose what IS a double (if the PK changes for example)

score 0 · Answer 4 · answered Jan 10 '13 at 14:38

You can find the duplicates by joining the table to itself, doing a group by the fields you are looking for duplicates in, and a having clause where count is greater than one.

Let's say your table name is customers, and your looking for duplicate name fields.

select cust_out.name, count(cust_count.name)
from customers cust_out
  inner join customers cust_count on cust_out.name = cust_count.name
group by cust_out.name
having count(cust_count.name) > 1

If you use this in a delete statement you would be deleting all the duplicate records, when you probably intend to keep on of the records.

So to select the records to delete,

select cust_dup.id
from customers cust
  inner join customers cust_dup on cust.name = cust_dup.name and cust_dup.id > cust.id
group by cust_dup.id

Simply delete duplicate content in a sql table

4 Answers4