-3

Problem: I am not sure how to go about deleting all "duplicate" records so that only distinct records are left.

So from this:

╔══════╦════════════╦════════╦════╗
║ date ║ dupe_count ║ field1 ║ id ║
╠══════╬════════════╬════════╬════╣
║ x    ║         48 ║ y      ║ a  ║
║ x    ║         48 ║ y      ║ b  ║
║ x    ║         48 ║ y      ║ c  ║
║ x    ║         48 ║ y      ║ d  ║
║ x    ║         48 ║ y      ║ e  ║
║ x    ║         48 ║ y      ║ f  ║
║ x    ║         48 ║ y      ║ g  ║
║ x    ║         48 ║ y      ║ h  ║
║ x    ║         48 ║ y      ║ i  ║
╚══════╩════════════╩════════╩════╝

To this:

╔══════╦════════════╦════════╦════╗
║ date ║ dupe_count ║ field1 ║ id ║
╠══════╬════════════╬════════╬════╣
║ x    ║          1 ║ y      ║ a  ║
╚══════╩════════════╩════════╩════╝

Any ideas?

Drew
  • 2,583
  • 5
  • 36
  • 54
  • @Drew query is not tough.but explanation of data is important to get best query .what is dupe_count ?and why it is 48.why is id non-numeric ?why dupe_count will become 1.it mean what ? – KumarHarsh Feb 17 '14 at 14:31

1 Answers1

2

SQL Server has the nice feature of updatable CTEs. So, you can do:

with todelete as (
      select t.*, row_number() over (partition by dupe_count, field1 order by id) as seqnum
      from table t
     )
delete from todelete
    where seqnum > 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786