-2

I have a SQL Server table with ~100 columns including the columns Id and CreationDate. Due to a bad constraint in its initial design, there are now many duplicate rows (i.e. rows whose values are identical across ALL columns).

Can you suggest a script to remove those duplicate rows?

Also, what would be a script to select all distinct Ids with the latest CreationDate?

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Hanch
  • 35
  • 4

1 Answers1

1

You can use the following script to remove duplicate rows from a Microsoft SQL Server table:

SELECT DISTINCT *
  INTO duplicate_table
  FROM original_table
  GROUP BY key_value
  HAVING COUNT(key_value) > 1

  DELETE original_table
      WHERE key_value
      IN (SELECT key_value
             FROM duplicate_table)


INSERT original_table
      SELECT *
         FROM duplicate_table

 DROP TABLE duplicate_table

When this script is executed, it follows these steps:

  1. It moves one instance of any duplicate row in the original table to a duplicate table.

  2. It deletes all rows from the original table that also reside in the duplicate table.

  3. It moves the rows in the duplicate table back into the original table.

  4. It drops the duplicate table.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
yaghob abbasi
  • 96
  • 1
  • 12