0

I'm using SQL Server 2014 and utilizing the AdventureWorks2012 sample database provided by Microsoft.

I'm trying to delete duplicate rows using sub-query below (option #2):

/* Option #2: SUBQUERY */

--SELECT * FROM
DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS (SELECT * 
              FROM
                  (SELECT 
                       NAME,
                       ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
                   FROM  
                       SQLPractice.[dbo].[CURRENCY]) AS T
              WHERE Flag > 1) 
GO

But it deletes all rows from the table.

But the other option (CTE) does delete only the duplicate rows.

/*** Option #3: CTE ***/ 
;WITH RepFlag AS
(
    SELECT 
        NAME,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
    FROM 
        SQLPractice.[dbo].[CURRENCY]
)
--SELECT * FROM RepFlag
DELETE RepFlag
WHERE Flag > 1

SELECT * 
FROM SQLPractice.[dbo].[CURRENCY]

Please use the code below to create your own test table.

/*** REMOVING DUPLICATE ROWS OPTION ***/
-- Creating a table 
SELECT TOP 0 *
INTO [dbo].[CURRENCY]
FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

-- inserting duplicate rows 
INSERT [dbo].[CURRENCY]
SELECT * FROM AdventureWorks2012.Sales.Currency
WHERE NAME LIKE  '%A';

/***** SELECTING COUNT OF DUPLICATED ROWS *****/ 

/*** Option #1: "GROUP BY" with "HAVING" ***/ 
SELECT 
    NAME, COUNT(*) AS Qty   
FROM 
    SQLPractice.[dbo].[CURRENCY]
GROUP BY 
    NAME
HAVING 
    COUNT(*) >1
GO
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Data Engineer
  • 795
  • 16
  • 41

6 Answers6

2

If you wanted to delete the duplicate name using a subquery,use the following method.

DELETE t
FROM  (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) t
WHERE t.Flag > 1
GO

You can also achieve this using common table expression (CTE).

;WITH cte_1
AS (SELECT  NAME,ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
              FROM  SQLPractice.[dbo].[CURRENCY]
            ) 
DELETE FROM cte_1
WHERE Flag > 1
Unnikrishnan R
  • 4,965
  • 1
  • 12
  • 21
1

Option #2 deletes all rows because the Subquery inside EXISTS will always return rows for all the rows of the table. There must be some relation between subquery inside EXISTS and the parent query. The subquery must generate different results according to each row of the table. One option delete to duplicate rows using a subquery when table has an identity col is :

DELETE from SQLPractice.[dbo].[CURRENCY]
where identityCol not in ( select min(identityCol) FROM SQLPractice.[dbo].[CURRENCY] GROUP BY NAME)
Akshey Bhat
  • 8,227
  • 1
  • 20
  • 20
1

One of possible methods:

DELETE tt
FROM [your table] tt
   INNER JOIN

    (SELECT NAME, MIN(PK) AS MIN_KEY)
    FROM [your table]
    GROUP BY Name
    HAVING COUNT(*) > 1) dup ON dup.name = tt.name and tt.PK <> dup.MIN_KEY
Anton
  • 2,846
  • 1
  • 10
  • 15
  • Thank you Anton for your solution, but it will not going to work as my table has no Primary Key. Basically you are proposing similar to Akshey's solution. – Data Engineer Sep 19 '16 at 05:21
  • 1
    If you don't have PK, then you may use cursor or "WHILE loop + temp table". So for each duplicated name, you execute "DELETE TOP(xxx)..." where xxx is "[number of duplicates for current name] - 1". SET ROWCOUNT can be also used instead of DELETE TOP – Anton Sep 19 '16 at 05:59
  • 1
    alternatively you can copy distinct rows (for duplicates only) to temp table, delete all duplicates, and reinsert the data from temp table. – Anton Sep 19 '16 at 06:00
1

In your sample case, Row_Number() will not help you to solve your problem. Because the duplicate rows are identical even in the primary key (candidate field) which is the CurrencyCode

Since you simply insert the same row into the target table, the ModifiedDate field is also the same.

For the sample case, you can apply a solution described at delete duplicate rows where no primary key exists

You can test and see that below DELETE command will delete all rows in the table

delete [dbo].[CURRENCY]
from [dbo].[CURRENCY]
inner join (
    select ROW_NUMBER() over (partition by CurrencyCode order by ModifiedDate) rn, CurrencyCode, ModifiedDate from [dbo].[CURRENCY]
) dublicates
    on dublicates.CurrencyCode = [dbo].[CURRENCY].CurrencyCode and
       dublicates.ModifiedDate = [dbo].[CURRENCY].ModifiedDate
where dublicates.rn > 1

For example from the tutorial, cursor method is suggested You can use following

DECLARE @Count int
DECLARE @CurrencyCode varchar(10)
DECLARE @ModifiedDate datetime

DECLARE dublicate_cursor CURSOR FAST_FORWARD FOR
SELECT CurrencyCode, ModifiedDate, Count(*) - 1
FROM CURRENCY
GROUP BY CurrencyCode, ModifiedDate
HAVING Count(*) > 1

OPEN dublicate_cursor

FETCH NEXT FROM dublicate_cursor INTO @CurrencyCode, @ModifiedDate, @Count

WHILE @@FETCH_STATUS = 0
BEGIN

SET ROWCOUNT @Count
DELETE FROM CURRENCY WHERE CurrencyCode = @CurrencyCode AND ModifiedDate = @ModifiedDate
SET ROWCOUNT 0

FETCH NEXT FROM dublicate_cursor INTO @CurrencyCode, @ModifiedDate, @Count
END

CLOSE dublicate_cursor
DEALLOCATE dublicate_cursor
Eralper
  • 6,461
  • 2
  • 21
  • 27
0

With statement remove only duplicate rows because it collect all duplicate records and then perform delete operation.

While in your sub-query you haven't specify where condition on which records you wants to delete, it should be written as below:

DELETE SQLPractice.[dbo].[CURRENCY]
WHERE EXISTS  
(
    SELECT * FROM 
    (
        SELECT 
        NAME,
        ID,
        ROW_NUMBER () OVER (PARTITION BY NAME ORDER BY NAME) AS Flag
        FROM SQLPractice.[dbo].[CURRENCY] 
    )   AS T
    WHERE Flag > 1 AND T.ID=[CURRENCY].ID
) 
0

you can try this by this query just duplicate records will be deleted i done this one base on currency duplicate values it deletes all the duplicate values

delete from test where currency in(select currency from test group by currency having count(*) >1)

Asifuzzaman Redoy
  • 1,773
  • 1
  • 15
  • 30