0

Due to a bug, I now have the possibility that certain tables have rows with duplicate data in primary key columns.

Let's say I have table T with primary key columns A, B, C and D, and non-PK columns E, F and G. In order for a row to be unique, A B C and D all must have unique values. I could have rows where A is the same, or A and B have the same values, or even A B and C. But if I have two rows where A B C and D all have the same value, that would be a problem.

Would this be the correct approach to find such an occurrence:

SELECT A, B, C, D, COUNT(*) AS 'Duplicates' FROM T
   GROUP BY A, B, C, D
   HAVING COUNT(*) > 1

Thanks for any assistance.

Greg
  • 15
  • 4
  • How did u end up with duplicate primary keys in the first place? Wouldn't constrain throw an exception? That said - yes, the above query should give u the dupes. – Yuriy Galanter May 15 '14 at 21:26
  • Do you mean a composite key or foreign keys from other tables? Other wise query looks fine. – Tanner May 15 '14 at 22:02

2 Answers2

0

The following query will return all the tuples where colA, colB, colC and colD are duplicated. I actually use this code at work to remove duplicate entries from a table. (Switch the select at the end to a delete, removes any duplicates while leaving one entry in the table)

with a as
   (SELECT 
         colA
         ,colB
         ,colC
         ,colD
      ,ROW_NUMBER() OVER(PARTITION by colA
                                             ,colB
                                             ,colC
                                             ,colD
                               ) as duplicateRecCount
    FROM Table)

 Select * from a 
 where duplicateRecCount > 1
orgtigger
  • 3,914
  • 1
  • 20
  • 22
  • Try to improve your answer. – inf3rno May 15 '14 at 22:03
  • @inf3rno - more specific please, the code provided should return all the data from the table where the 4 field Primary key has been duplicated. I use a modified version of this same code at work to remove duplicate entries from a staging table. What else should my code include? – orgtigger May 15 '14 at 22:35
  • To be more specific: format the code according to a coding standard: http://stackoverflow.com/questions/522356/what-sql-coding-standard-do-you-follow and write some text in which you describe what it does, or why it works. (Somebody flagged your answer as low quality, if you don't improve it, I think it will be deleted.) – inf3rno May 15 '14 at 22:52
0

Aren't you using declarative referential integrity? If not, why not?

Something like this ought to do you:

with duplicate_row as
(
  select distinct
         x.A ,
         x.B ,
         x.C ,
         x.D ,
         x.E ,
         x.F ,
         x.G
  from ( select * ,
                seq = row_number() over (
                        partition by A,B,C,D
                        order by E,F,G
                        )
         from dbo.my_table
       ) x
  where x.seq > 1
)
delete dbo.my_table
from dbo.my_table  t
join duplicate_row d on d.A = t.A -- IMPORTANT:
                    and d.B = t.B -- you must join against ALL
                    and d.C = t.C -- columns, key and non-key
                    and d.D = t.D -- lest you blow away data
                    and d.E = t.E -- inadvertantly
                    and d.F = t.F
                    and d.G = t.G
GO

alter table dbo.my_table add constraint
  my_table_PK primary key clustered (A,B,C,D)
GO
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135