1

We have done a translations api were our users can add translations to the system. A bug in the api added duplicate rows which i need to remove.

enter image description here

The translationConnection_Id combined with TranslationCompanyKey_Id is a key and therefor should not be able to be duplicate. I since i'm a real sucker at SQL i need some help to create a script to remove all duplicates but saves one of the rows.

SELECT TOP 1000 [Id]
  ,[Value]
  ,[TranslationConnection_Id]
  ,[TranslationCompanyKey_Id]
 FROM [AAES_TRAN].[dbo].[Translations]
Daniel Gustafsson
  • 1,725
  • 7
  • 37
  • 78

3 Answers3

5

I think this will work. Try the SELECT part first to be sure.

DELETE FROM tblTranslations WHERE ID IN (
    SELECT MAX(ID) FROM tblTranslations
    GROUP BY TranslationConnection_Id, TranslationCompanyKey_Id
    HAVING COUNT(*) > 1
)
idstam
  • 2,848
  • 1
  • 21
  • 30
0

You can use ROW_NUMBER() analytic function for this like

SELECT * FROM (
SELECT TOP 1000 [Id]
  ,[Value]
  ,[TranslationConnection_Id]
  ,[TranslationCompanyKey_Id]
  ,ROW_NUMBER() OVER(PARTITION BY [Value] ORDER BY [Value]) AS rn
 FROM [AAES_TRAN].[dbo].[Translations] ) xxx
WHERE rn = 1;
Rahul
  • 76,197
  • 13
  • 71
  • 125
0

I think you would like to keep one record from the duplicate records.

This code will delete duplicate records but will keep one records. Try this:

DELETE FROM TestTranslationTable WHERE ID IN(
     SELECT Id FROM TestTranslationTable tblMain
     WHERE tblMain.Id NOT IN (SELECT TOP 1 tblNested.id FROM TestTranslationTable tblNested WHERE 
     tblNested.TranslationConnectionId=tblMain.TranslationConnectionId ORDER BY tblNested.TranslationConnectionId)
     AND tblMain.TranslationConnectionId IN(SELECT TranslationConnectionId FROM TestTranslationTable 
     GROUP BY TranslationConnectionId HAVING COUNT(TranslationConnectionId)>1 )
     )
     GO

Ref: https://jwcooney.com/2011/11/18/sql-server-delete-duplicate-records-except-the-first-duplicate-record/

Azhar
  • 107
  • 12