How expensive is select distinct * query

Question

In sql server 2012, I have got a table with more than 25 million rows with duplicates. The table doesn't have unique index. It only has a non-clustered index. I wanted to eliminate duplicates and so, I m thinking of the below

select distinct * into #temp_table from primary_table
truncate primary_table
select * into primary_table from #temp_table

I wanted to know how expensive is select distinct * query. If my procedure above is very expensive, I wanted to know if there is another alternate way.

That looks a very painful way of clearing duplicates, not to mention what happens to the users when your truncate has cleared all the data out. [Here are some ideas](http://stackoverflow.com/q/3317433/314291) on how to delete duplicates more delicately - the accepted answer doesn't require a unique or primary key. — StuartLC, Mar 04 '14 at 05:41
Thats correct Stuart. Very painful. Thanks for the ideas. I would also like to know how sql-server manages to give unique rows (like does it uses something like a hash). — sunil_mlec, Mar 04 '14 at 05:46
Do a google for `sql uniquifier` - e.g. [here](http://stackoverflow.com/questions/12321905/why-does-sql-server-add-a-4-byte-integer-to-non-unique-clustered-indexes). Rows are still unique to SqlServer :) — StuartLC, Mar 04 '14 at 05:48
Don't forget that, after you've done this, put something in place (e.g. a primary key or unique key) to stop this happening again. Or do you want to have to perform this clean up more than once? — Damien_The_Unbeliever, Mar 04 '14 at 07:16

score 4 · Answer 1 · edited Nov 13 '18 at 23:32

4

I don't know how expensive it is, but an alternate way is to create another table with a primary key, insert all the data there and silently reject the duplicates as stated here

http://web.archive.org/web/20180404165346/http://sqlblog.com:80/blogs/paul_white/archive/2013/02/01/a-creative-use-of-ignore-dup-key.aspx

basically, using IGNORE_DUP_KEY

edited Nov 13 '18 at 23:32

Community

1
1

answered Mar 04 '14 at 05:40

Leo

6,480
4
37
52

How expensive is select distinct * query

1 Answers1

Linked