0

I wanted to know if there is an easy way to remove duplicates from a table sql.

Rather than fetch the whole table and delete the data if they appear twice.

Thank you in advance

This is my structure :

CREATE TABLE IF NOT EXISTS `mups` (
  `idgroupe` varchar(15) NOT NULL,
  `fan` bigint(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Gonzalo.-
  • 12,512
  • 5
  • 50
  • 82
SoCkEt7
  • 2,237
  • 5
  • 22
  • 30

4 Answers4

1

If you are using Sql Server

Check this: SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

Sample Code using CTE:

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO
Kapil Khandelwal
  • 15,958
  • 2
  • 45
  • 52
  • this does not work ! WITH mups (idgroupe,fan, DuplicateCount) AS ( SELECT idgroupe,fan, ROW_NUMBER() OVER(PARTITION BY idgroupe,fan ORDER BY idgroupe) AS DuplicateCount FROM mups ) DELETE FROM mups WHERE DuplicateCount > 1 – SoCkEt7 Jan 10 '13 at 15:07
0

Add a calculated column that takes the checksum of the entire row. Search for any duplicate checksums, rank and remove the duplicates.

Vinnie
  • 3,889
  • 1
  • 26
  • 29
  • Checksums can generate false positives. If the checksum is, say, 32-bits wide, then you only need ~80000 different rows for there to be >50% chance of two checksums being the same. – Damien_The_Unbeliever Jan 10 '13 at 14:56
0

you can do something like this :

DELETE from yourTable WHERE tableID in 
(SELECT clone.tableID 
 from yourTable origine,
  yourTable clone 
 where clone.tableID= origine.tableID)

But in the WHERE, you can either compare the indexes or compare each other fields...

depending on how you find your doubles.

note, this solution has the advantage of letting you choose what IS a double (if the PK changes for example)

Ludovic Migneault
  • 140
  • 1
  • 4
  • 19
0

You can find the duplicates by joining the table to itself, doing a group by the fields you are looking for duplicates in, and a having clause where count is greater than one.

Let's say your table name is customers, and your looking for duplicate name fields.

select cust_out.name, count(cust_count.name)
from customers cust_out
  inner join customers cust_count on cust_out.name = cust_count.name
group by cust_out.name
having count(cust_count.name) > 1

If you use this in a delete statement you would be deleting all the duplicate records, when you probably intend to keep on of the records.

So to select the records to delete,

select cust_dup.id
from customers cust
  inner join customers cust_dup on cust.name = cust_dup.name and cust_dup.id > cust.id
group by cust_dup.id
Marlin Pierce
  • 9,931
  • 4
  • 30
  • 52