1

i am using sql server 2005. i have a table contains duplicated rows. how can i eliminated those duplicate rows in that table? for e.g., the table may contain 3 similar rows in which i want to delete 2 rows and keep the original

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Kumaran T
  • 617
  • 1
  • 9
  • 24

3 Answers3

1

If you just want to eliminate duplicate records from your result set, you can use the DISTINCT command:

SELECT DISTINCTI field1, field2 FROM...

If you want to delete those duplicate records, you can use COUNT to detect which records have more than one instance, and then deleting them with a subquery

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
franz976
  • 333
  • 3
  • 14
  • If you post code or XML, **please** highlight those lines in the text editor and click on the "code" button (101 010) on the editor toolbar to nicely format and syntax highlight it! – marc_s Nov 26 '10 at 10:54
  • the action you said above will delete all the duplicated rows but 1 want to keep one – Kumaran T Nov 26 '10 at 10:55
  • Not if you use MAX(id) in the subquery to return you just the first id instance – franz976 Nov 26 '10 at 11:26
  • Using a DISTINCT may be a lot more expensive than using the CTE method above with a large dataset or wide table. – DataWriter Nov 27 '10 at 12:40
0

First you can copy duplicate record into another table like as following way

Select fieldnames into #temp from table1 group by fieldnames having count (*) > 1

then remove that record from original table

delete from table1 where fieldname in (select fieldnames from #temp)

and finally copy record from temporary table to original source table.

insert table1 select * from #temp

through above mentioned steps you can eliminate duplicate record from table.

Pankaj Agarwal
  • 11,191
  • 12
  • 43
  • 59
0

For understanding purpose, lets take a simple table Employee with below schema

EmployeeId - int
EmployeeName    varchar(50)
Age int

Lets populate with duplicate values. Please note primary key is not duplicated in this case

INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (1,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (2,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (3,'Sam',24)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (4,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (5,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (6,'Tom',22)
GO 

we can make use of CTE in finding the duplicate rows. Gather duplicate rows by using Group by/Count statement. Once Duplicate rows are Identified, we select those rows from the main table using join condition. Now rank those Rows and delete all the rows apart from the rows with rank 1. I find this a lot more elegant.

WITH TotalDuplicates(EmployeeName,Age,Total) AS 
(
    SELECT EmployeeName,Age,COUNT(employeeId)  AS Total  FROM Employee 
    GROUP BY EmployeeName,Age
    HAVING COUNT(employeeId) > 1
)
,DistinctRows(EmployeeId,EmployeeName,Age) AS 
(
    SELECT E.EmployeeId,E.EmployeeName,E.Age FROM Employee AS E
    INNER JOIN TotalDuplicates AS D 
    ON (E.EmployeeName = D.EmployeeName AND E.Age = D.Age)
)
,OrderedDuplicateTables(EmployeeId,EmployeeName,Age,Ranking) AS 
(
    SELECT 
        EmployeeId,
        EmployeeName,
        Age,
        RANK() OVER (PARTITION BY EmployeeName, Age ORDER BY EmployeeId DESC) 
    FROM DistinctRows 
)

DELETE FROM Employee
WHERE EmployeeId IN (SELECT EmployeeId FROM OrderedDuplicateTables WHERE Ranking > 1)
Hunter
  • 2,370
  • 2
  • 20
  • 24