2

I need to update a huge table (over 200 million records, 20+ columns)

I tried to update one column:

update Table1 set [Customer]=Null where [Customer]='-' or len([customer])=0

And it took over 2 hours. I tried it on all columns and it's still running, for over 5 days now.

update Table1 set [Name]=Null where [Name]='-' or len([Name])=0
update Table1 set [Email]=Null where [Email]='-' or len([Email])=0
...

BTW- the table does not have any indexes or triggers, only data. The DB is not in use and recovery mode is simple.

Is there any more efficient way to update big tables?

squillman
  • 13,363
  • 3
  • 41
  • 60
rotem
  • 115
  • 1
  • 8
  • 1
    You might want to select INTO a new table, then replace the old one. Is that an option? – Aadmaa Oct 20 '21 at 16:28
  • 1
    Can you just create some indexes and drop them later if you don't want to carry them around? Why are you using `len([Col])=0` instead of `[Col]=''` for (n)(var)char columns? – squillman Oct 20 '21 at 16:32
  • @Aadmaa,im open to options but i cant see how it faster, its still need to go over every row and every column – rotem Oct 20 '21 at 16:32
  • 1
    You might give it a shot; I'm pretty sure it'll be an order of magnitude faster. Add the indexes after the select. – Aadmaa Oct 20 '21 at 16:35
  • @squillman, columns are nvarchar. what is the difference between len([Col])=0 and [Col]=''? – rotem Oct 20 '21 at 16:35
  • The difference is that `len([Col])=0` is not [SARGable](https://stackoverflow.com/q/799584/75852). The `len` function has to be executed for every row in the table and even if you had an index on the column the index can't be used. If you're using that to check for empty string values then you're much better off using `Col=''`. Then you can add an index for the column and the optimizer will be much happier. – squillman Oct 20 '21 at 16:38
  • @squillman, i can understand that on indexed column, but this table got any index. i will try it on 1 column , but now im afraid to cancel the 5 days query so it wont take another 5 days of canceling. – rotem Oct 20 '21 at 16:49
  • Checking a value is still cheaper than executing a function against it. Why can't you just add some indexes and then drop them later if you don't to keep them? – squillman Oct 20 '21 at 16:50
  • @squillman, i will try the Col=''. about add indexes, for that i will need to index every column (20+, some are nvarchar(max)). my main purpose is to set empty or '-' records to null in the whole table. – rotem Oct 20 '21 at 16:59
  • So you index every column and the updates run in a fraction of the time. Unless you don't have the physical storage to hold the indexes I don't see what the drawback is. Unless your values are just next to totally unique it's certainly not going to take 5+ days to run with indexes, and the indexes certainly aren't going to take that long to build. If you have the ability to use indexes, that's what will save you here. – squillman Oct 20 '21 at 17:00
  • 5 days is simply not a slow update, did you check for blocking? And did you really mean "no index" - like, the table doesn't have a clustered index or any non-clustered indexes? – Aaron Bertrand Oct 20 '21 at 17:55

0 Answers0