0

I have an INSEE table in my SQL Server 2008 R2. I have some duplicate values (15 000 lines out of 54 000 have to be deleted). This is my current script

    declare @datainsee table
    (
        di_nom varchar(100),
        di_departement varchar(5),
        di_type varchar
    )
    declare @datainseeidentifiant table
    (
        dii_nom varchar(100),
        dii_departement varchar(5),
        dii_type varchar,
        dii_identifiant uniqueidentifier
    )
    insert into @datainsee select iee_nom,iee_departement,iee_type from insee
    group by iee_nom,iee_departement,iee_type 
    having(count(*)>1)

    insert into @datainseeidentifiant 
    select di_nom,di_code,di_type,
    (select top 1 iee_guid from insee where iee_departement=di_departement and iee_nom=di_nom and iee_type= di_type) from @datainsee

    delete from insee
    where iee_guid not in (select dii_identifiant from @datainseeidentifiant) and iee_nom in (select dii_nom from @datainseeidentifiant)

Rows are considered as duplicate if they have the same iee_nom, iee_departement and iee_type. I tried to look at solutions like T-SQL: Deleting all duplicate rows but keeping one but it doesn't work in my case because GUID is not number..

Any ideas to optimise my query? One minute is too long for my application.

Community
  • 1
  • 1

3 Answers3

2

With the help of a CTE and ROW_NUMBER.

This should be enough:

with x as (
select
    ROW_NUMBER() OVER (
        PARTITION BY
            iee_nom, iee_departemen, iee_type
        ORDER BY
            <pick your priority column here>
    )rID,
    *
from insee
)
delete from x where rID > 1
mxix
  • 3,539
  • 1
  • 16
  • 23
  • Dont work for me.. Msg 252 - Recursive common table expression 'insee' does not contain a top-level UNION ALL operator. – The Dark Boz Mar 30 '16 at 15:52
  • @ErenAtolgan what name did you give to the CTE? insee? give it a different name like i did in the example. And do your delete over that alias. – mxix Mar 30 '16 at 16:15
1

I suggest you to use traditional way like below, Using Temp Table

SELECT DISTINCT * INTO #TmpTable FROM insee

DELETE FROM insee
--OR Use Truncate to delete records

INSERT INTO insee  SELECT * FROM #TmpTable

DROP TABLE #TmpTable
Jaydip Jadhav
  • 12,179
  • 6
  • 24
  • 40
1

Assuming you have ID(Auto increment primary key) column in table.

DELETE n1 FROM insee n1
        INNER JOIN
    insee n2 ON n1.iee_nom = n2.iee_nom
        AND n1.iee_departement = n2.iee_departement
        AND n1.iee_type = n2.iee_type
        AND n1.id > n2.id;
Abhishek Ginani
  • 4,511
  • 4
  • 23
  • 35