finding duplicates and removing but keeping one value

Question

I currently have a URL redirect table in my database that contains ~8000 rows and ~6000 of them are duplicates.

I was wondering if there was a way I could delete these duplicates based on a certain columns value and if it matches, I am looking to use my "old_url" column to find duplicates and I have used

SELECT old_url
    ,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

however I'm not sure what I can do to remove them now as I don't want to lose every single one, just the duplicates. They are almost a match completely apart from sometimes the new_url is different and the url_id (GUID) is different in each time

When you're done, remember to add a `UNIQUE` constraint to the table, so that you don't have to do this job again. — Damien_The_Unbeliever, Aug 21 '13 at 07:08
The only issue is. I can't really fiddle with the database as it's not for personal usage, I will suggest this though. — Hello World, Aug 21 '13 at 07:09

score 2 · Accepted Answer · answered Aug 21 '13 at 07:14

2

In my opinion ranking functions and a CTE are the easiest approach:

WITH CTE AS
(
    SELECT old_url
          ,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
    FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1

Change ORDER BY DateColumn ASC accordingly to determine which records should be deleted and which record should be left alone. In this case i delete all newer duplicates.

answered Aug 21 '13 at 07:14

Tim Schmelter

450,073
74
686
939

Hey, thank you very much! This seems to work great for me. I really appreciate the help and thanks for the clear explanation and links to information. – Hello World Aug 21 '13 at 07:18

score 0 · Answer 2 · answered Aug 21 '13 at 07:06

If your table has a primary key then this is easy:

BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))

INSERT INTO #T VALUES 
    (1, 'foo'),
    (2, 'moo'),
    (3, 'foo'),
    (4, 'moo'),
    (5, 'foo'),
    (6, 'zoo'),
    (7, 'foo')

DELETE FROM #T WHERE Id NOT IN (
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) = 1
    UNION 
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) > 1)

SELECT * FROM #T

DROP TABLE #T

ROLLBACK

cool, i learn something from u =) – Low Chee Mun Aug 21 '13 at 07:13 — Low Chee Mun, Aug 21 '13 at 07:13

Low Chee Mun · Answer 3 · 2013-08-21T07:18:16.583

0

this is the sample to delete multiple record with guid, hope it can help u=)

DECLARE @t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)

INSERT INTO @t1 VALUES 
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')

so now, a duplicated record with guid is created in @t1

;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM @t1 
)
DELETE FROM @t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)

with query above, duplicated record is deleted from @t1, i use Row_number() to distinct each of the records

SELECT * FROM @t1

edited Aug 21 '13 at 07:18

answered Aug 21 '13 at 07:11

Low Chee Mun

610
1
4
9

I don't really understand that this is doing, please can you explain to me? – Hello World Aug 21 '13 at 07:13
k, i will update it now – Low Chee Mun Aug 21 '13 at 07:15
best is u run my query to c what is going on =) – Low Chee Mun Aug 21 '13 at 07:18

finding duplicates and removing but keeping one value

3 Answers3