Delete duplicates, keeping one of them, grouping by

Question

I need to delete all the duplicates, all but one, for each of the table ids. Like the following. I need to delete all the duplicates on valueid for 01,02,03...

Original:

id      | valueid   |   data
____________________________
01      | 1001      |   datadata1       
01      | 1002      |   datadata2
01      | 1001      |   datadata1
02      | 1323      |   datamoredata123
02      | 1323      |   datamoredata123
03      | 22123     |   evenmoredata
03      | 24444     |   andalsomore

Should end like:

id      | valueid   |   data
____________________________
01      | 1001      |   datadata1       
01      | 1002      |   datadata2
02      | 1323      |   datamoredata123
03      | 22123     |   evenmoredata
03      | 24444     |   andalsomore

Was trying to do it with something like this, but I don´t get how can I group that delete on the id

WITH CTE AS(
   SELECT valueid,
       RN = ROW_NUMBER()OVER(PARTITION BY valueid ORDER BY valueid)
   FROM tblvalues
)
DELETE FROM CTE WHERE RN > 1

Any suggestions?

Thanks in advance

score 5 · Accepted Answer · answered Jan 07 '19 at 16:44

You need to add id column to the PARTITION:

WITH CTE AS(
   SELECT valueid,
       RN = ROW_NUMBER()OVER( PARTITION BY id, valueid ORDER BY data)
   FROM tblvalues
)
DELETE FROM CTE WHERE RN > 1

This way you delete duplicate valueid values separately for each id. Column data determines which duplicates are deleted.

Gordon Linoff · Answer 2 · 2019-01-07T16:46:52.350

3

You are pretty close. You need to change the partition by clause. You want one row per id/valueid pair, so these should both be in the partitioning clause:

WITH todelete AS (
   SELECT valueid,
       RN = ROW_NUMBER() OVER (PARTITION BY id, valueid ORDER BY data)
   FROM tblvalues
)
DELETE FROM todelete WHERE RN > 1;

edited Jan 07 '19 at 16:46

answered Jan 07 '19 at 16:43

Gordon Linoff

1,242,037
58
646
786

Additionally I would include the data column in the order by (after the valueid)... – Tyron78 Jan 07 '19 at 16:44

Fill · Answer 3 · 2019-01-07T17:16:52.537

-1

A very simple way to do this is to add the UNIQUE index in column (valueid). When you write an ALTER statement, specify the IGNORE keyword.

ALTER IGNORE TABLE tblvalues
ADD UNIQUE INDEX idx_name (valueid);

This will remove all duplicate rows. As an added advantage, future INSERTs that are duplicates will be erroneous. As always, you can take a backup before running something like this.

edited Jan 07 '19 at 17:16

answered Jan 07 '19 at 16:46

Fill

98
1
12

When posting a code answer, it's helpful to the OP to explain the the answer you've provided. In this case, how does this answer help the OP `DELETE` the duplicated rows? – Thom A Jan 07 '19 at 16:57
Thanks for the update, however, if you run that you'll get the error `Unknown object type 'IGNORE' used in a CREATE, DROP, or ALTER statement. `. [db<>fiddle](https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=5b2b2efce8a1a834585d54ad1adb237a) – Thom A Jan 07 '19 at 17:14
@Larnu update answer, sorry – Fill Jan 07 '19 at 17:17
The thing is, this is a already closed database design. Can´t really alter it right now. Thanks for your time and answer, though. – vvic Jan 07 '19 at 17:23

Delete duplicates, keeping one of them, grouping by

3 Answers3