1

I have a table where unique records can be identified using a concatenation of ORDER and ITEM, like so:

ORDER ITEM
1234 001
1234 002
1235 001
1235 002
1235 003
1236 001
1237 001
... ...

The table currently has duplicates of each unique combination, like so:

ORDER ITEM
1234 001
1234 002
1235 001
1235 002
1235 003
1236 001
1237 001
1234 001
1234 002
1235 001
1235 002
1235 003
1236 001
1237 001
... ...

I'm using the following subquery to select all duplicate rows:

SELECT * FROM (
    SELECT order + item AS ID,
    Row_Number() OVER(PARTITION BY order, item ORDER BY order) as CN
    FROM [schema].[table]
    ) 
AS Q WHERE Q.CN > 1

With the result:

ID CN
1234001 2
1234001 3
1234001 4
1234002 2
1234002 3
1234002 4
1235001 2
1235001 3
1235001 4
... ...

Forgive me if this is a very simple question, but is there a simple modification to turn the above subquery into a delete statement and remove every record it currentlyt selects (i.e. with CN > 1)?

3PLanalyst
  • 25
  • 5
  • See https://stackoverflow.com/a/30244183/2055998 – PM 77-1 Mar 08 '23 at 18:34
  • the table only has those 2 columns? if that is the case, you dont need to delete anything, instead, using `SELECT DISTINCT [Order], [ITEM] FROM table`, if there exists the other columns, you need to take into the consideration which row should be kept when there is more than 1 row returned for that pair. – LONG Mar 08 '23 at 19:19
  • If records are unique based on these values, why isn't there a constraint on the table for this? – JimmyV Mar 08 '23 at 19:29
  • Thank you for your reply, there are, indeed, other columns. – 3PLanalyst Mar 08 '23 at 19:31

3 Answers3

2

You can use the below to delete duplicate record

DELETE FROM [schema].[table]
WHERE order + item IN (
    SELECT order + item
    FROM [schema].[table]
    GROUP BY order, item
    HAVING COUNT(*) > 1
)
Abdulmajeed
  • 1,502
  • 2
  • 10
  • 13
  • A simple solution, indeed -- thank you. I'm sorry to ask this, but I fearI shouldn't have shared the table name, so I changed it to a generic [schema].[table] name in the initial question. Would you mind making this small revision so it reflects accurately? Thank you in advance, and apologies for this oversight. – 3PLanalyst Mar 08 '23 at 19:30
2

also you can use self join like :

DELETE t1 FROM [schema].[table] t1 INNER JOIN (
    SELECT order, item
    FROM [schema].[table]
    GROUP BY order, item
    HAVING COUNT(*) > 1 ) t2 ON t1.order = t2.order AND t1.item = t2.item;
2

Using your subquery with a CTE to get the duplicated records, then a simple delete with where clause cn > 1

with cte as (
    SELECT _order, item,
    Row_Number() OVER(PARTITION BY _order, item ORDER BY _order) as CN
    FROM mytable
)
Delete from cte WHERE cn > 1;

Demo here

SelVazi
  • 10,028
  • 2
  • 13
  • 29