Removing duplicates from a single column

Question

I have a table like this:

table

id
=====
1
1
2
2
6
6
7
5
5
9

I need to remove duplicates other than creating a new table. One way of doing it is:

create table_new (select distinct id from table )

Is there any delete command which removes the duplicates?

possible duplicate of [How to remove duplicate entries from a mysql db?](http://stackoverflow.com/questions/2867530/how-to-remove-duplicate-entries-from-a-mysql-db) — rob mayoff, Aug 08 '12 at 07:13
See here: http://stackoverflow.com/questions/11557757/cleaning-up-db-of-redundant-data/11557812#11557812 — Omesh, Aug 08 '12 at 11:11

valex · Answer 1 · 2012-08-08T18:39:11.207

1

IMHO in common case (Ansi SQL) you can't do it because without unique key field you can't distinguish one row from another so you can't delete it.

But in MySQL you can run this UGLY query. It uses @S sting to accumulate ID's so DON'T use it on a big table:

set @S:=',';
delete from t 
where if(LOCATE(CONCAT(',',ID,','),@S)>0,'del',@S:=CONCAT(@S,ID,','))='del'

edited Aug 08 '12 at 18:39

answered Aug 08 '12 at 15:16

valex

23,966
7
43
60

score 0 · Answer 2 · edited Oct 13 '12 at 03:29

First hit on google:

http://support.microsoft.com/kb/139444/en-us?fr=1

The article describes the use of a temporary table to hold the unique records, clearing the original table and adding the unique records back to it, you can probably do it by writing some stored procedure but i suppose you only have to do this once? then i would go with the method as described. Also to prevent future problem use the unique or primary key constraint on that field so that it is not possible to have 2 records with the same value.

score 0 · Answer 3 · answered Aug 08 '12 at 06:52

You can use cte:

;WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY ID 
                                       ORDER BY ( SELECT 0)) RN
         FROM   #MyTable)
DELETE FROM cte
WHERE  RN > 1

This will avoid the need to create a temporary or holding table for your distinct records. If you have additional columns that may contain a duplicate value, then you can add them after ID in the PARTITION BY expression.

Although I would be concerned as to why you have duplicate ID values.

If this is a row ID, then I would suggest using IDENTITY and auto-increment the IDs.
If it is an item or reference ID, then I would try to avoid adding duplicates (this could be completely wrong depending on your structure)

Kamal · Answer 4 · 2012-08-08T07:05:47.697

0

A single query to delete all duplicates (tested in Oracle)

delete from my_table where rowid not in (select min(rowid) from temp_table group by id);

Explanation:

Oracle gives every row a rowid by default (I am sure other databases have same or similar thing).

Let's look at the inner query first. Group by clause groups the rows based on some criteria, in our case Id. We can group based on multiple columns as well (group by col1, col2). When we say min(rowid), we are just making sure that it return one rowid in case there are multiple rows with duplicate columns.

The outer delete query is simple, it just deletes everything else (duplicates) from table which is not returned by our inner query

edited Aug 08 '12 at 07:05

answered Aug 08 '12 at 06:55

Kamal

5,462
8
45
58

Is there any standard delete query which works in all databases – syncdm2012 Aug 08 '12 at 07:06
@Mdharmendra Haven't worked much on mysql, but it seems this is what you are looking for http://stackoverflow.com/questions/2728413/equivalent-of-oracles-rowid-in-mysql – Kamal Aug 08 '12 at 07:10

score 0 · Answer 5 · answered Aug 08 '12 at 07:01

0

Sample code

http://sqlfiddle.com/#!3/56182/2

answered Aug 08 '12 at 07:01

Satinder singh

10,100
16
60
102

thanks but I have only one column in the table so your example does not fit to my situation – syncdm2012 Aug 08 '12 at 07:15

score 0 · Answer 6 · edited Aug 08 '12 at 14:34

I'm not sure if there is any direct implementation, but here is a function implementation which you might be interested at.

function RemoveDuplicates($TableName, $UniqueFieldName, $IDFieldName, $FirstFoundIDValue)
{
    $Query =    "DELETE  FROM ".$TableName." ".
                "WHERE    ".$IDFieldName." IN ".
                "( SELECT a.".$IDFieldName." ".
                    "FROM ".$TableName." a, ".$TableName." b ".
                    "WHERE (a.".$UniqueFieldName." = b.".$UniqueFieldName.") ".
                    "AND (a.".$IDFieldName." > ".$FirstFoundIDValue.") ".
                    "GROUP BY a.".$IDFieldName." ".
                    "HAVING COUNT(a.".$UniqueFieldName.") > 1 ".
                ")";
    db_query($Query);

// Example Use:            
// delete  from tblps_CommEquip 
// where PSID in    
// ( select a.PSID from tblps_CommEquip a, tblps_CommEquip b     
//      where    (a.CommEquipType = b.CommEquipType) 
//      AND (a.PSID > 80)    
//      group by a.PSID 
//      having count(a.CommEquipType) > 1
// )
}

score 0 · Answer 7 · edited Aug 08 '12 at 13:14

0

The best way for deleting duplicate rows is to provide a primary key column that uniquely identifies each record in the table.

edited Aug 08 '12 at 13:14

Peter DeWeese

18,141
8
79
101

answered Aug 08 '12 at 10:10

lojuego

11

Removing duplicates from a single column

7 Answers7