1

I currently have a URL redirect table in my database that contains ~8000 rows and ~6000 of them are duplicates.

I was wondering if there was a way I could delete these duplicates based on a certain columns value and if it matches, I am looking to use my "old_url" column to find duplicates and I have used

SELECT old_url
    ,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

however I'm not sure what I can do to remove them now as I don't want to lose every single one, just the duplicates. They are almost a match completely apart from sometimes the new_url is different and the url_id (GUID) is different in each time

Hello World
  • 1,379
  • 4
  • 20
  • 41

3 Answers3

2

In my opinion ranking functions and a CTE are the easiest approach:

WITH CTE AS
(
    SELECT old_url
          ,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
    FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1

Change ORDER BY DateColumn ASC accordingly to determine which records should be deleted and which record should be left alone. In this case i delete all newer duplicates.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Hey, thank you very much! This seems to work great for me. I really appreciate the help and thanks for the clear explanation and links to information. – Hello World Aug 21 '13 at 07:18
0

If your table has a primary key then this is easy:

BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))

INSERT INTO #T VALUES 
    (1, 'foo'),
    (2, 'moo'),
    (3, 'foo'),
    (4, 'moo'),
    (5, 'foo'),
    (6, 'zoo'),
    (7, 'foo')

DELETE FROM #T WHERE Id NOT IN (
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) = 1
    UNION 
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) > 1)

SELECT * FROM #T

DROP TABLE #T

ROLLBACK
satnhak
  • 9,407
  • 5
  • 63
  • 81
0

this is the sample to delete multiple record with guid, hope it can help u=)

DECLARE @t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)

INSERT INTO @t1 VALUES 
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')

so now, a duplicated record with guid is created in @t1

;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM @t1 
)
DELETE FROM @t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)

with query above, duplicated record is deleted from @t1, i use Row_number() to distinct each of the records

SELECT * FROM @t1
Low Chee Mun
  • 610
  • 1
  • 4
  • 9