0

I am using SQL Server 2008 R2.

I found duplicate rows with this script:

SELECT CLDest, CdClient,
 COUNT(CLDest) AS NumOccurrences
FROM DEST
GROUP BY CLDest,CdClient
HAVING ( COUNT(CLDest) > 1 )

It return 48 entries

Before I delete I have to make sure that I delete the doubles:

SELECT DEST.CdClient
      ,DEST.CLDest
FROM [Soft8Exp_Client_WEB].[dbo].[DEST]
WHERE DEST.CdClient IN (SELECT  CdClient
                        FROM DEST
                        GROUP BY CdClient
                        HAVING (COUNT(CLDest) > 1) )
  AND DEST.CLDest IN (SELECT CLDest
                      FROM DEST
                      GROUP BY CLDest
                      HAVING (COUNT(CLDest) > 1) )

This query returns 64628 entries

So I suppose my select is wrong.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user609511
  • 4,091
  • 12
  • 54
  • 86
  • Duplicate: http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows Select everything into groups and keep the first item in every group. – MrFox Mar 03 '14 at 16:00
  • Why do you use a `and`clause? You first query found all doubles. So you don't need the second `in`in your query. Stop your query after the first subquery, and try it. – provençal le breton Mar 03 '14 at 16:01
  • the second query is for delete...when i return right entries then i can replace select with delete – user609511 Mar 03 '14 at 16:02
  • I meant, your query is like : `select id from where ida in() and idb in()`. Just try it like `select id from where ida in()` – provençal le breton Mar 03 '14 at 16:04

3 Answers3

2

SQL Server has the nice property of updatable CTEs. When combined with the function row_number(), this does what you want:

with todelete as (
      select d.*,
             row_number() over (partition by CLDest, CdClient order by newid()) as seqnum
      from dest d
     )
delete from todelete
    where seqnum > 1;

This version will randomly delete one of the duplicates. What it does is assign a sequential number to the rows with the same value and delete all but the first one found. If you want to keep something by date, then use a different expression in the order by.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1
;WITH Duplicates
 AS
   (
   SELECT CLDest
        , CdClient
        , ROW_NUMBER() OVER (PARTITION BY CLDest, CdClient ORDER BY CdClient) AS Rn
   FROM DEST
   )
DELETE FROM Duplicates
WHERE RN > 1
M.Ali
  • 67,945
  • 13
  • 101
  • 127
0
SELECT DEST.CdClient,DEST.CLDest 
FROM [Soft8Exp_Client_WEB].[dbo].[DEST]
WHERE DEST.CdClient+DEST.CLDest 
  IN (
    SELECT CdClient+CLDest FROM DEST GROUP BY CLDest HAVING ( COUNT(CLDest) > 1 )
   )
ahPo
  • 374
  • 3
  • 9