Deleting "Duplicate" Records

Question

Problem: I am not sure how to go about deleting all "duplicate" records so that only distinct records are left.

So from this:

╔══════╦════════════╦════════╦════╗
║ date ║ dupe_count ║ field1 ║ id ║
╠══════╬════════════╬════════╬════╣
║ x    ║         48 ║ y      ║ a  ║
║ x    ║         48 ║ y      ║ b  ║
║ x    ║         48 ║ y      ║ c  ║
║ x    ║         48 ║ y      ║ d  ║
║ x    ║         48 ║ y      ║ e  ║
║ x    ║         48 ║ y      ║ f  ║
║ x    ║         48 ║ y      ║ g  ║
║ x    ║         48 ║ y      ║ h  ║
║ x    ║         48 ║ y      ║ i  ║
╚══════╩════════════╩════════╩════╝

To this:

╔══════╦════════════╦════════╦════╗
║ date ║ dupe_count ║ field1 ║ id ║
╠══════╬════════════╬════════╬════╣
║ x    ║          1 ║ y      ║ a  ║
╚══════╩════════════╩════════╩════╝

Any ideas?

@Drew query is not tough.but explanation of data is important to get best query .what is dupe_count ?and why it is 48.why is id non-numeric ?why dupe_count will become 1.it mean what ? — KumarHarsh, Feb 17 '14 at 14:31

score 2 · Accepted Answer · answered Feb 17 '14 at 14:03

2

SQL Server has the nice feature of updatable CTEs. So, you can do:

with todelete as (
      select t.*, row_number() over (partition by dupe_count, field1 order by id) as seqnum
      from table t
     )
delete from todelete
    where seqnum > 1;

answered Feb 17 '14 at 14:03

Gordon Linoff

1,242,037
58
646
786

Deleting "Duplicate" Records

1 Answers1