0

I have a table with 10 million records with no indexes and I am trying to dedupe the table. I tried the inserts with select where either using a left join or where not exists; but each time I get the error with violation of key. The other problem is that the log file grows too large and the transaction will not complete. I tried setting the recovery to simple as recommended online but that does not help. Here are the queries I used;

insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
  where not exists(select 1 from temp as t where t.profile=s.profile)

This just produces the violation of key error. I tried using the following:

insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s 
left outer join temp t on t.profile=s.profile where t.profile is null

In both instances now the log file fills up before the transaction completes. So my main question is about the log file and I can figure out the deduping with the queries.

vbNewbie
  • 3,291
  • 15
  • 71
  • 155

2 Answers2

2

You may need to work in batches. Write a loop to go through 5000 (you can experiment with the number, I've had to go as far down as 500 or up to 50,000 depending on the db and how busy it was) records or so at a time.

What is your key? Likely your query will need to pick using an aggreagate function on dataadded (use the min or the max function).

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • I was told that regardless of using batch I need to have a non clustered index on the source table which I do not have since it does contain dupes. or how can this be done then? – vbNewbie Mar 28 '12 at 15:52
  • thanks; I tried the batch idea and got the violation of Unique key constraint on the destination table. How do I get pass this? Thanks a mill for your help . – vbNewbie Mar 28 '12 at 21:04
  • please describe the key field(s) in the destination table – HLGEM Mar 28 '12 at 21:11
  • Destination Table :IX_Temp - profileUrl(ASC)--> unique key (non clustered) Source Table: IX_PURL - profileUrl(ASC) ---> index (non clustered, not unique – vbNewbie Mar 28 '12 at 21:42
1

the bigger the transaction, the bigger the transaction log will be.

The log is used for uncommitted recovery of an open transaction so if you’re not committing frequently and your executing a very large transaction, it will cause the log file to grow substantially. Once it commits, then the file will become free space. This is to safe guard the data in case something fails and roll back is needed.

my suggestion would be to run the insert in batches, committing after each batch

Diego
  • 34,802
  • 21
  • 91
  • 134
  • thanks for your response. Do you have any idea why I still get the violation of unique key error – vbNewbie Mar 28 '12 at 21:05
  • see here: http://stackoverflow.com/questions/6483699/unique-key-violation-in-sql-server-is-it-safe-to-assume-error-2627 – Jason Clark Nov 17 '15 at 05:42