Delete duplicate rows using Sub-query

Question

I'm using SQL Server 2014 and utilizing the AdventureWorks2012 sample database provided by Microsoft.

I'm trying to delete duplicate rows using sub-query below (option #2):

/* Option #2: SUBQUERY */

--SELECT * FROM
DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS (SELECT * 
              FROM
                  (SELECT 
                       NAME,
                       ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
                   FROM  
                       SQLPractice.[dbo].[CURRENCY]) AS T
              WHERE Flag > 1) 
GO

But it deletes all rows from the table.

But the other option (CTE) does delete only the duplicate rows.

/*** Option #3: CTE ***/ 
;WITH RepFlag AS
(
    SELECT 
        NAME,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
    FROM 
        SQLPractice.[dbo].[CURRENCY]
)
--SELECT * FROM RepFlag
DELETE RepFlag
WHERE Flag > 1

SELECT * 
FROM SQLPractice.[dbo].[CURRENCY]

Please use the code below to create your own test table.

/*** REMOVING DUPLICATE ROWS OPTION ***/
-- Creating a table 
SELECT TOP 0 *
INTO [dbo].[CURRENCY]
FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

-- inserting duplicate rows 
INSERT [dbo].[CURRENCY]
SELECT * FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

/***** SELECTING COUNT OF DUPLICATED ROWS *****/ 

/*** Option #1: "GROUP BY" with "HAVING" ***/ 
SELECT 
    NAME, COUNT(*) AS Qty   
FROM 
    SQLPractice.[dbo].[CURRENCY]
GROUP BY 
    NAME
HAVING 
    COUNT(*) >1
GO

score 2 · Answer 1 · answered Sep 19 '16 at 05:55

If you wanted to delete the duplicate name using a subquery,use the following method.

DELETE t
FROM  (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) t
WHERE t.Flag > 1
GO

You can also achieve this using common table expression (CTE).

;WITH cte_1
AS (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) 
DELETE FROM cte_1
WHERE Flag > 1

Akshey Bhat · Answer 2 · 2016-09-19T05:00:13.613

1

Option #2 deletes all rows because the Subquery inside EXISTS will always return rows for all the rows of the table. There must be some relation between subquery inside EXISTS and the parent query. The subquery must generate different results according to each row of the table. One option delete to duplicate rows using a subquery when table has an identity col is :

DELETE from SQLPractice.[dbo].[CURRENCY]
where identityCol not in ( select min(identityCol) FROM SQLPractice.[dbo].[CURRENCY] GROUP BY NAME)

edited Sep 19 '16 at 05:00

answered Sep 19 '16 at 04:47

Akshey Bhat

8,227
1
20
20

Yes, Thanks a lot. I thought about it. Just wanted to know how to bypass it without altering the table definition. – Data Engineer Sep 19 '16 at 05:09
You can use cte – Akshey Bhat Sep 19 '16 at 05:15
I can. And I used it. As shown on the code above. Just wanted to explore different options. – Data Engineer Sep 19 '16 at 14:56

score 1 · Answer 3 · answered Sep 19 '16 at 04:52

1

One of possible methods:

DELETE tt
FROM [your table] tt
   INNER JOIN

    (SELECT NAME, MIN(PK) AS MIN_KEY)
    FROM [your table]
    GROUP BY Name
    HAVING COUNT(*) > 1) dup ON dup.name = tt.name and tt.PK <> dup.MIN_KEY

answered Sep 19 '16 at 04:52

Anton

2,846
1
10
15

Thank you Anton for your solution, but it will not going to work as my table has no Primary Key. Basically you are proposing similar to Akshey's solution. – Data Engineer Sep 19 '16 at 05:21
1

If you don't have PK, then you may use cursor or "WHILE loop + temp table". So for each duplicated name, you execute "DELETE TOP(xxx)..." where xxx is "[number of duplicates for current name] - 1". SET ROWCOUNT can be also used instead of DELETE TOP – Anton Sep 19 '16 at 05:59
1

alternatively you can copy distinct rows (for duplicates only) to temp table, delete all duplicates, and reinsert the data from temp table. – Anton Sep 19 '16 at 06:00

Eralper · Answer 4 · 2016-09-19T05:46:35.777

In your sample case, Row_Number() will not help you to solve your problem. Because the duplicate rows are identical even in the primary key (candidate field) which is the CurrencyCode

Since you simply insert the same row into the target table, the ModifiedDate field is also the same.

For the sample case, you can apply a solution described at delete duplicate rows where no primary key exists

You can test and see that below DELETE command will delete all rows in the table

delete [dbo].[CURRENCY]
from [dbo].[CURRENCY]
inner join (
    select ROW_NUMBER() over (partition by CurrencyCode order by ModifiedDate) rn, CurrencyCode, ModifiedDate from [dbo].[CURRENCY]
) dublicates
    on dublicates.CurrencyCode = [dbo].[CURRENCY].CurrencyCode and
       dublicates.ModifiedDate = [dbo].[CURRENCY].ModifiedDate
where dublicates.rn > 1

For example from the tutorial, cursor method is suggested You can use following

DECLARE @Count int
DECLARE @CurrencyCode varchar(10)
DECLARE @ModifiedDate datetime

DECLARE dublicate_cursor CURSOR FAST_FORWARD FOR
SELECT CurrencyCode, ModifiedDate, Count(*) - 1
FROM CURRENCY
GROUP BY CurrencyCode, ModifiedDate
HAVING Count(*) > 1

OPEN dublicate_cursor

FETCH NEXT FROM dublicate_cursor INTO @CurrencyCode, @ModifiedDate, @Count

WHILE @@FETCH_STATUS = 0
BEGIN

SET ROWCOUNT @Count
DELETE FROM CURRENCY WHERE CurrencyCode = @CurrencyCode AND ModifiedDate = @ModifiedDate
SET ROWCOUNT 0

FETCH NEXT FROM dublicate_cursor INTO @CurrencyCode, @ModifiedDate, @Count
END

CLOSE dublicate_cursor
DEALLOCATE dublicate_cursor

Sandip - Frontend Developer · Answer 5 · 2016-09-19T05:10:10.577

0

With statement remove only duplicate rows because it collect all duplicate records and then perform delete operation.

While in your sub-query you haven't specify where condition on which records you wants to delete, it should be written as below:

DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS  
(
    SELECT * FROM 
    (
        SELECT 
        NAME,
        ID,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
        FROM SQLPractice.[dbo].[CURRENCY] 
    )   AS T
    WHERE Flag > 1 AND T.ID=[CURRENCY].ID
)

edited Sep 19 '16 at 05:10

answered Sep 19 '16 at 04:51

Sandip - Frontend Developer

14,397
4
35
61

wont this also delete all rows for currencies which are occurring more than once? – Akshey Bhat Sep 19 '16 at 04:54
it will remove duplicate items, so you have one record for each currency – Sandip - Frontend Developer Sep 19 '16 at 04:55
For e.g. if 'US Dollar' is coming twice, then the inner query will return rows for both of these rows. So both of them will be deleted – Akshey Bhat Sep 19 '16 at 04:57
but OP has put condition WHERE Flag > 1, so it not remove first flag row means from duplicate row delete all rows except first one – Sandip - Frontend Developer Sep 19 '16 at 04:58
but in my case the value of column "Flag" will be 2 for both the rows. hence both will be deleted. – Akshey Bhat Sep 19 '16 at 05:01
Thanks Sandip. But the solution you proposed removes all rows. This is not going to work. – Data Engineer Sep 19 '16 at 05:06
which is your PK column? please apply where on pk column it works like replace name with PK column (i.e ID) – Sandip - Frontend Developer Sep 19 '16 at 05:09
Sandip, there is no Primary key – Data Engineer Sep 19 '16 at 05:16
then how you accept answer of Akshay his logic on PK/identity column. Sorry @akshay not consider personally, i just wants OP highlight PK portion in your answer – Sandip - Frontend Developer Sep 19 '16 at 05:18

score 0 · Answer 6 · answered Sep 19 '16 at 05:21

0

you can try this by this query just duplicate records will be deleted i done this one base on currency duplicate values it deletes all the duplicate values

delete from test where currency in(select currency from test group by currency having count(*) >1)

answered Sep 19 '16 at 05:21

Asifuzzaman Redoy

1,773
1
15
30

Thanks, but this removes all rows. So, this is not going to work. – Data Engineer Sep 19 '16 at 05:27

Delete duplicate rows using Sub-query

6 Answers6

Linked