25

I have a table in SQL Server 2008 R2 with close to a billion rows. I want to change the datatype of two columns from int to bigint. Two times ALTER TABLE zzz ALTER COLUMN yyy works, but it's very slow. How can I speed the process up? I was thinking to copy the data to another table, drop, create, copy back and switching to simple recovery mode or somehow doing it with a cursor a 1000 rows a time but I'm not sure if those will actually lead to any improvement.

user1417408
  • 253
  • 1
  • 3
  • 6
  • 2
    Can you explain what you are changing the data type from and to? – Aaron Bertrand May 25 '12 at 12:45
  • 1
    I can't see how you could use a cursor to change the column type? A column type is a change to _all_ values in a table. – Oded May 25 '12 at 12:46
  • Something like create a new column, copy data in a cursor, drop the old column, rename the new column. No idea if this works. – user1417408 May 25 '12 at 12:47
  • I think the `Alter` statement you supplied is your best bet here. – Mathew Thompson May 25 '12 at 12:47
  • http://stackoverflow.com/questions/1420176/change-huge-table-pk-column-data-type – Jon Crowell May 25 '12 at 12:50
  • How about transactions, should I use a single one for both statements, or one per each? Should I change recovery mode, or transaction isolation level(guess not, as this is a write)? – user1417408 May 25 '12 at 12:51
  • Maybe you should also consider doing an archive of some of your data. I'm guessing that this has built up over a long period of time (otherwise your probably would of gone with a bigint right away) so you are probably holding onto a lot of data that you just don't reference, but will effect the performance of your db. – Limey May 25 '12 at 14:00
  • @Limey You are absolutely right, but now is not the best time. – user1417408 May 25 '12 at 14:14

3 Answers3

41

Depending on what change you are making, sometimes it can be easier to take a maintenance window. During that window (where nobody should be able to change the data in the table) you can:

  1. drop any indexes/constraints pointing to the old column, and disable triggers
  2. add a new nullable column with the new data type (even if it is meant to be NOT NULL)
  3. update the new column setting it equal to the old column's value (and you can do this in chunks of individual transactions (say, affecting 10000 rows at a time using UPDATE TOP (10000) ... SET newcol = oldcol WHERE newcol IS NULL) and with CHECKPOINT to avoid overrunning your log)
  4. once the updates are all done, drop the old column
  5. rename the new column (and add a NOT NULL constraint if appropriate)
  6. rebuild indexes and update statistics

The key here is that it allows you to perform the update incrementally in step 3, which you can't do in a single ALTER TABLE command.

This assumes the column is not playing a major role in data integrity - if it is involved in a bunch of foreign key relationships, there are more steps.

EDIT

Also, and just wondering out loud, I haven't done any testing for this (but adding it to the list). I wonder if page + row compression would help here? If you change an INT to a BIGINT, with compression in place SQL Server should still treat all values as if they still fit in an INT. Again, I haven't tested if this would make an alter faster or slower, or how much longer it would take to add compression in the first place. Just throwing it out there.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • The DB will be offline. Will setting the recovery model to simple help? – user1417408 May 25 '12 at 13:01
  • In SIMPLE or FULL, your huge ALTER TABLE command will not be any different, since the entire thing is logged in both cases. With the above, SIMPLE might be a little better, but recovery model shouldn't matter if you pick an appropriate batch size. Also if the database is "offline" (I assume you don't mean ALTER DATABASE...SET OFFLINE), why does speed matter? The above would probably shave some percentage off the time, but it won't be mind-blowingly faster. What you're doing simply takes time. – Aaron Bertrand May 25 '12 at 13:02
  • Aaron, thanks for your help. The speed matters because the site using this DB will only be offline for a few hours. – user1417408 May 25 '12 at 13:09
  • Right, I was just saying it's not going to change your 4-hour alter to a 5 minute operation, it's still going to take time... – Aaron Bertrand May 25 '12 at 13:09
  • Aaron, do you have a sense of how this would perform, versus creating a new table, inserting into it, and renaming the table? I am thinking that the two alter statements and update require three passes through the table, versus two for copy and rename. – Gordon Linoff May 25 '12 at 13:11
  • 1
    @Gordon ALTER TABLE ... ADD nullable column should be a metadata-only operation and should be very fast - I wouldn't qualify that on the same level as "pass through the table." Also if you copy the entire table, you're moving around a lot more data than just the int column. Changing tables can also be very complicated if you have constraints, foreign keys, it will mess up dependencies, etc. I don't think it buys you much over working within the existing table. – Aaron Bertrand May 25 '12 at 13:14
  • That seems to be correct (I just've added non-NULL columns in the past). This question has been asked before http://stackoverflow.com/questions/4311559/sql-server-performance-for-alter-table-alter-column-change-data-type. – Gordon Linoff May 25 '12 at 13:27
  • These columns are nullable, so `WHERE newcol IS NULL` won't work. – user1417408 May 25 '12 at 14:24
  • 2
    @user1417408 Ok, if the source column is nullable then `WHERE newcol IS NULL AND oldcol IS NOT NULL`... – Aaron Bertrand May 25 '12 at 14:26
  • Hmm, with two columns I have to say `WHERE newcol1 IS NULL AND newcol2 IS NULL AND (oldcol1 IS NOT NULL OR oldcol2 IS NOT NULL)`. Will this have a big impact on speed? – user1417408 May 25 '12 at 14:39
  • Doubtful, and in the grand scheme of things, even more doubtful. But you could always test and let us know. :-) – Aaron Bertrand May 25 '12 at 14:41
  • Here is a follow-up on my attempts on a large table. The suggested solution works well on tables 50 million or more tables. Higher than 100's of millions, it still becomes pretty time consuming, especially if you're not riding on the back of a index with your while looped update statement. However, a good solution without duplicating the entire table, just one column. – Fastidious Mar 04 '15 at 19:49
  • Just want to share my experience, I updated a table with 23 million rows and dropping indexes on columns other than old column also helped improve update speed dramatically @AaronBertrand – Rohit Gupta May 13 '20 at 03:35
0

Here is how I changed column datatype (int to decimal) on a huge table:

-- 1) add the new column with the new data type, allow null
ALTER TABLE dbo.[Table] ADD [Column2] DECIMAL(18,2) NULL;

-- 2) copy data from the old to the new column (with or without TOP)
UPDATE TOP(10000) t SET t.[Column2] = t.[Column1] 
FROM dbo.[Table] t WHERE t.[Column2] IS NULL;

-- 3) drop constraints (if any) on old column, then drop old column
ALTER TABLE dbo.[Table] DROP CONSTRAINT DF_Table_Column1;
ALTER TABLE dbo.[Table] DROP COLUMN [Column1];

-- 4) rename the new column to the old column's name
EXEC sys.sp_rename 
    @objname = N'dbo.[Table].Column2', 
    @newname = 'Column1', 
    @objtype = 'COLUMN';

-- 5) add NOT NULL constraint on the new column
ALTER TABLE dbo.[Table] ALTER COLUMN Column1 DECIMAL(18,2) NOT NULL;
ALTER TABLE dbo.[Table] ADD CONSTRAINT DF_Table_Column1 DEFAULT (1) FOR [Column1];
mortenma71
  • 1,078
  • 2
  • 9
  • 27
-9

If you are using something like SQL Server Management Studio you just go to the table in the database, right-click, select 'Design' and then choose the column you want to edit : set it to bigint and hit save. Changes the whole column, but previous values will remain as they are. This is good for allowing a table to 'grow out of' int into bigint, but won't change the existing data as far as I know.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Bevan
  • 566
  • 4
  • 7
  • 19