Large SQL table delete all duplicates

Question

I have a database table with 33265902 (thirty-three million two hundred sixty-five thousand nine hundred two) records in it.

Much of this is made up of duplicates.

It is causing me issues as any website that queries this table is causing load on the server.

What makes this table unique is two fields, email_address and job_id

My question is what is the best way to delete all duplicates leaving one good record per field criteria.

What is the most efficient way to do this without killing the server.

Possible duplicate of [How to delete duplicate rows in SQL Server?](https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server) — Thom A, Jul 23 '19 at 10:17
Possible duplicate of [How to delete duplicate rows in SQL Server?](https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server) — MusicLovingIndianGirl, Jul 23 '19 at 10:19

Zaynul Abadin Tuhin · Answer 1 · 2019-07-23T10:21:23.713

1

i would prefer creatin a new table from the existing one by using distinct value and then drop the old table then rename the new created table, so all the steps will be

1. select distinct columns and insert into new_table
2. drop old_table
3. rename new_table to old_table

edited Jul 23 '19 at 10:21

answered Jul 23 '19 at 10:20

Zaynul Abadin Tuhin

31,407
5
33
63

Yes, I like this idea alot but I use Entity Framework, is there a chance the table rename could cause issues with EF? – Matthew Stott Jul 23 '19 at 10:21
@MatthewStott no issues cause old_table will be got again into your DB – Zaynul Abadin Tuhin Jul 23 '19 at 10:22

score 1 · Answer 2 · answered Jul 23 '19 at 10:20

1

You can use limit

DELETE FROM table [WHERE conditions] LIMIT row_count

call this query every 5mins. Just limit to 10000 data or add a little bit more

answered Jul 23 '19 at 10:20

John Rhoy Gadot

74
6

score 1 · Accepted Answer · answered Jul 23 '19 at 10:44

The most efficient way is probably to keep one row per table in a new table. Then truncate the old table and re-insert the values:

select t.*  -- or select all columns except seqnum
into temp_table
from (select t.*,
             row_number() over (partition by email_address, job_id order by job_id) as seqnum
      from t
     ) t
where seqnum = 1;

-- optional if you remove seqnum above
-- alter table temp_table drop column seqnum;


-- back this up first!
truncate table t;

insert to t
    select * from temp_table;

You could drop the original table and rename temp_table to the table name. If you do so, remember to create constraints, indexes, triggers, partitions and whatever else is on the original table.

This will not be super fast (you can see how long it takes to create the temporary table). However, it does not require the locking and logging from deleting the original table. As a bonus, the new table should be less fragmented than if you did deletes.

score 0 · Answer 4 · answered Jul 23 '19 at 10:19

This may help...

;WITH DuplicateData
AS(
SELECT *,DENSE_RANK() OVER(PARTITION BY email_address, job_id ORDER BY email_address, job_id ASC) [Rank]
FROM TableName
)
DELETE d
FROM DuplicateData d WHERE Rank > 1
GO

Check this: https://stackoverflow.com/a/56982523/5496498

score 0 · Answer 5 · answered Jul 23 '19 at 10:26

You can try this: (replace table_name with your table name)

    WITH cte AS (    
    SELECT job_id , email_address ,    
    row_number() OVER(PARTITION BY job_id , email_address order by job_id ) AS [rn]    
    FROM table_name    
 )    
 DELETE cte WHERE [rn] > 1

another method using temp table :

 select distinct * into ‪#‎tmptbl‬ From table_name    
 delete from table_name    
 insert into table_name    
 select * from #tmptbl drop table #tmptbl

score 0 · Answer 6 · answered Jul 23 '19 at 23:00

0

I went with Gordon Lindoffs suggestion the end and seems to have worked great.

My issue now is I am left with loads of space I need to claim back for the os

Is shrinking the dB the right thing?

answered Jul 23 '19 at 23:00

Matthew Stott

395
1
4
15

Large SQL table delete all duplicates

6 Answers6