i am using sql server 2005. i have a table contains duplicated rows. how can i eliminated those duplicate rows in that table? for e.g., the table may contain 3 similar rows in which i want to delete 2 rows and keep the original
-
Please **search** first before posting a question - this has been asked over and over and over again - – marc_s Nov 26 '10 at 10:56
-
1possible duplicate of [SQL: Deleting duplicate records in SQL Server](http://stackoverflow.com/questions/1772592/sql-deleting-duplicate-records-in-sql-server) – marc_s Nov 26 '10 at 10:56
-
marc_s that link helps me a lot – Kumaran T Nov 26 '10 at 11:01
-
Get more lots http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows Looks like ppl like to hate duplicates – Gennady Vanin Геннадий Ванин Nov 26 '10 at 14:54
3 Answers
If you just want to eliminate duplicate records from your result set, you can use the DISTINCT command:
SELECT DISTINCTI field1, field2 FROM...
If you want to delete those duplicate records, you can use COUNT to detect which records have more than one instance, and then deleting them with a subquery
-
If you post code or XML, **please** highlight those lines in the text editor and click on the "code" button (101 010) on the editor toolbar to nicely format and syntax highlight it! – marc_s Nov 26 '10 at 10:54
-
the action you said above will delete all the duplicated rows but 1 want to keep one – Kumaran T Nov 26 '10 at 10:55
-
Not if you use MAX(id) in the subquery to return you just the first id instance – franz976 Nov 26 '10 at 11:26
-
Using a DISTINCT may be a lot more expensive than using the CTE method above with a large dataset or wide table. – DataWriter Nov 27 '10 at 12:40
First you can copy duplicate record into another table like as following way
Select fieldnames into #temp from table1 group by fieldnames having count (*) > 1
then remove that record from original table
delete from table1 where fieldname in (select fieldnames from #temp)
and finally copy record from temporary table to original source table.
insert table1 select * from #temp
through above mentioned steps you can eliminate duplicate record from table.

- 11,191
- 12
- 43
- 59
For understanding purpose, lets take a simple table Employee with below schema
EmployeeId - int
EmployeeName varchar(50)
Age int
Lets populate with duplicate values. Please note primary key is not duplicated in this case
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (1,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (2,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (3,'Sam',24)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (4,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (5,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (6,'Tom',22)
GO
we can make use of CTE in finding the duplicate rows. Gather duplicate rows by using Group by/Count statement. Once Duplicate rows are Identified, we select those rows from the main table using join condition. Now rank those Rows and delete all the rows apart from the rows with rank 1. I find this a lot more elegant.
WITH TotalDuplicates(EmployeeName,Age,Total) AS
(
SELECT EmployeeName,Age,COUNT(employeeId) AS Total FROM Employee
GROUP BY EmployeeName,Age
HAVING COUNT(employeeId) > 1
)
,DistinctRows(EmployeeId,EmployeeName,Age) AS
(
SELECT E.EmployeeId,E.EmployeeName,E.Age FROM Employee AS E
INNER JOIN TotalDuplicates AS D
ON (E.EmployeeName = D.EmployeeName AND E.Age = D.Age)
)
,OrderedDuplicateTables(EmployeeId,EmployeeName,Age,Ranking) AS
(
SELECT
EmployeeId,
EmployeeName,
Age,
RANK() OVER (PARTITION BY EmployeeName, Age ORDER BY EmployeeId DESC)
FROM DistinctRows
)
DELETE FROM Employee
WHERE EmployeeId IN (SELECT EmployeeId FROM OrderedDuplicateTables WHERE Ranking > 1)

- 2,370
- 2
- 20
- 24