Delete duplicate records using a concatenated column in TSQL table

Question

I have a table where unique records can be identified using a concatenation of ORDER and ITEM, like so:

ORDER	ITEM
1234	001
1234	002
1235	001
1235	002
1235	003
1236	001
1237	001
...	...

The table currently has duplicates of each unique combination, like so:

ORDER	ITEM
1234	001
1234	002
1235	001
1235	002
1235	003
1236	001
1237	001
1234	001
1234	002
1235	001
1235	002
1235	003
1236	001
1237	001
...	...

I'm using the following subquery to select all duplicate rows:

SELECT * FROM (
    SELECT order + item AS ID,
    Row_Number() OVER(PARTITION BY order, item ORDER BY order) as CN
    FROM [schema].[table]
    ) 
AS Q WHERE Q.CN > 1

With the result:

ID	CN
1234001	2
1234001	3
1234001	4
1234002	2
1234002	3
1234002	4
1235001	2
1235001	3
1235001	4
...	...

Forgive me if this is a very simple question, but is there a simple modification to turn the above subquery into a delete statement and remove every record it currentlyt selects (i.e. with CN > 1)?

the table only has those 2 columns? if that is the case, you dont need to delete anything, instead, using `SELECT DISTINCT [Order], [ITEM] FROM table`, if there exists the other columns, you need to take into the consideration which row should be kept when there is more than 1 row returned for that pair. — LONG, Mar 08 '23 at 19:19
If records are unique based on these values, why isn't there a constraint on the table for this? — JimmyV, Mar 08 '23 at 19:29

Abdulmajeed · Accepted Answer · 2023-03-08T19:45:46.830

2

You can use the below to delete duplicate record

DELETE FROM [schema].[table]
WHERE order + item IN (
    SELECT order + item
    FROM [schema].[table]
    GROUP BY order, item
    HAVING COUNT(*) > 1
)

edited Mar 08 '23 at 19:45

answered Mar 08 '23 at 18:48

Abdulmajeed

1,502
2
10
13

A simple solution, indeed -- thank you. I'm sorry to ask this, but I fearI shouldn't have shared the table name, so I changed it to a generic [schema].[table] name in the initial question. Would you mind making this small revision so it reflects accurately? Thank you in advance, and apologies for this oversight. – 3PLanalyst Mar 08 '23 at 19:30

score 2 · Answer 2 · answered Mar 08 '23 at 19:50

2

also you can use self join like :

DELETE t1 FROM [schema].[table] t1 INNER JOIN (
    SELECT order, item
    FROM [schema].[table]
    GROUP BY order, item
    HAVING COUNT(*) > 1 ) t2 ON t1.order = t2.order AND t1.item = t2.item;

answered Mar 08 '23 at 19:50

fsafarkhani

71
3

score 2 · Answer 3 · answered Mar 08 '23 at 19:50

2

Using your subquery with a CTE to get the duplicated records, then a simple delete with where clause cn > 1

with cte as (
    SELECT _order, item,
    Row_Number() OVER(PARTITION BY _order, item ORDER BY _order) as CN
    FROM mytable
)
Delete from cte WHERE cn > 1;

Demo here

answered Mar 08 '23 at 19:50

SelVazi

10,028
2
13
29

1

This is perfect, thank you. I accepted another answer but I will likely use this. Again, thank you. – 3PLanalyst Mar 08 '23 at 20:14

Delete duplicate records using a concatenated column in TSQL table

3 Answers3