I have an INSEE
table in my SQL Server 2008 R2. I have some duplicate values (15 000 lines out of 54 000 have to be deleted). This is my current script
declare @datainsee table
(
di_nom varchar(100),
di_departement varchar(5),
di_type varchar
)
declare @datainseeidentifiant table
(
dii_nom varchar(100),
dii_departement varchar(5),
dii_type varchar,
dii_identifiant uniqueidentifier
)
insert into @datainsee select iee_nom,iee_departement,iee_type from insee
group by iee_nom,iee_departement,iee_type
having(count(*)>1)
insert into @datainseeidentifiant
select di_nom,di_code,di_type,
(select top 1 iee_guid from insee where iee_departement=di_departement and iee_nom=di_nom and iee_type= di_type) from @datainsee
delete from insee
where iee_guid not in (select dii_identifiant from @datainseeidentifiant) and iee_nom in (select dii_nom from @datainseeidentifiant)
Rows are considered as duplicate if they have the same iee_nom
, iee_departement
and iee_type
.
I tried to look at solutions like T-SQL: Deleting all duplicate rows but keeping one but it doesn't work in my case because GUID
is not number..
Any ideas to optimise my query? One minute is too long for my application.