3

Possible Duplicate:
delete duplicate records in SQL Server

I have a table in which unique records are denoted by a composite key, such as (COL_A, COL_B).

I have checked and confirmed that I have duplicate rows in my table by using the following query:

select COL_A, COL_B, COUNT(*)
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
order by count(*) desc

Now, I would like to remove all duplicate records but keep only one.

Could someone please shed some light on how to achieve this with 2 columns?

EDIT: Assume the table only has COL_A and COL_B

Community
  • 1
  • 1
czchlong
  • 2,434
  • 10
  • 51
  • 65
  • 1
    Are the rest of the columns the same? How do you decide which you want to keep? – FJT Oct 17 '12 at 17:26
  • @FionaT, In this case, I don't care which one I keep as long as I keep one. This is because they all have the same (COL_A, COL_B) pair. For simplicity's sake, there are only COL_A and COL_B. – czchlong Oct 17 '12 at 17:43
  • 1
    If you don't have another field (hopefully a key field), then your delete statement will delete all records. You have to have another field to differentiate the records. – Data Masseur Oct 17 '12 at 18:53
  • http://stackoverflow.com/questions/3317433/delete-duplicate-records-in-sql-server Come on all I did was search SO for TSQL delete duplicate records. – paparazzo Oct 17 '12 at 21:17

2 Answers2

2

1st solution, It is flexible, because you can add more columns than COL_A and COL_B :

-- create table with identity filed
-- using idenity we can decide which row we can delete
create table MY_TABLE_COPY
(
  id int identity,
  COL_A varchar(30), 
  COL_B  varchar(30)
  /*
     other columns
  */
)
go
-- copy data
insert into MY_TABLE_COPY (COL_A,COL_B/*other columns*/)
select COL_A, COL_B /*other columns*/
from MY_TABLE
group by COL_A, COL_B
having count(*) > 1
-- delete data from  MY_TABLE
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A 
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE (COL_A, COL_B /*other columns*/)
select t.COL_A, t.COL_B /*other columns*/
from MY_TABLE_COPY t
where t.id = (
               select max(id) 
               from MY_TABLE_COPY c  
               where t.COL_A = c.COL_A
               and  t.COL_B = c.COL_B
             ) 
go

2nd solution If you have really two columns in MY_TABLE you can use:

-- create table and copy data
select distinct COL_A, COL_B
into MY_TABLE_COPY
from MY_TABLE
-- delete data from  MY_TABLE 
-- only duplicates (!)
delete MY_TABLE
from MY_TABLE_COPY c, MY_TABLE t
where c.COL_A=t.COL_A 
and c.COL_B=t.COL_B
go
-- copy data without duplicates
insert into MY_TABLE
select t.COL_A, t.COL_B
from MY_TABLE_COPY t
go
Robert
  • 25,425
  • 8
  • 67
  • 81
  • You could have just used a SELECT DISTINCT on your INSERT into MY_TABLE_COPY. Then you wouldn't need the where clause with the subquery when reinserting them. Up vote for working around the lack of uniqueness though. – Data Masseur Oct 17 '12 at 19:18
  • @Data Masseur Thanks. Yes I could used a `SELECT DISTINCT`, but It will work if in MY_TABLE we have more then two columns. – Robert Oct 17 '12 at 19:28
  • @DataMasseur I edited the post with you suggestions. – Robert Oct 17 '12 at 19:38
  • Thanks for the detailed answers, will try and let you know. – czchlong Oct 17 '12 at 20:44
1

Try:

-- Copy Current Table
SELECT * INTO #MY_TABLE_COPY FROM MY_TABLE 

-- Delte all rows from current able
DELETE FROM MY_TABLE

-- Insert only unique values, removing your duplicates
INSERT INTO MY_TABLE
SELECT DISTINCT * FROM #MY_TABLE_COPY

-- Remove Temp Table
DROP TABLE #MY_TABLE_COPY

That should work as long as you don't break any foreign keys when deleting rows from MY_TABLE.

Shawn H
  • 550
  • 6
  • 15