Changing a datatype in a table without reaching the transaction log limits

Question

I have a huge table with about 100 million rows in SQL Server and want to change the datatype from varchar to nvarchar like this:

ALTER TABLE my_table 
    ALTER COLUMN comment NVARCHAR(250);

The problem is that my transaction log grows until the hard disk is full.

I have thought about kind of a bulk transaction but this is basically only one statement.

Does a temp table work when the new tables are created using the new datatypes?

CREATE TABLE my_table_new (comment NVARCHAR(250);

INSERT INTO my_table_new 
    SELECT * 
    FROM my_table;

-- or

SELECT comment
INTO my_table_new
FROM my_table;

DROP TABLE my_table;

What is the best approach to do this?

Any idea how I can avoid the extrem growth of the transaction log file?

score 1 · Accepted Answer · answered Dec 08 '19 at 20:51

1

You might find it faster to create a new table, truncate the old, and then re-insert:

select t.*
into temp_my_table
from my_table t;

truncate table my_table;  -- back it up first!

alter table mytable alter column comment nvarchar(max);  -- no need to be cheap here

insert into mytable
    select t.*
    from t;

One caveat: You may have to pay attention if you have insert triggers or identity columns.

answered Dec 08 '19 at 20:51

Gordon Linoff

1,242,037
58
646
786

Thanks! I have no insert triggers or identity columns, which is good. Of course my table has more columns which shouldn't be a problem, should it? Any hint for a script which does some of those conversions? Would you wrap each one (as described above) in its own transaction? Any other thoughts? – Christian Dec 08 '19 at 21:00
@Christian, you could convert as needed in the `SELECT...INTO` column list using `CAST` or `CONVERT` instead of afterward. That will allow you to convert multiple columns in one go. For NOT NULL columns, you many need to surround the conversion function with `ISNULL` to coerce a not null result column. – Dan Guzman Dec 08 '19 at 21:28
@DanGuzman Thanks a lot for the tipp for multiple columns. In general I think performance is not a problem, it's more related to the growing redo log in the database using the full recovery model, from what I have seen transactions might help me. – Christian Dec 08 '19 at 21:35
I think I have found some hints, at least for my single column case here https://stackoverflow.com/questions/39419529/best-way-to-update-40-million-rows-in-batch and here https://stackoverflow.com/questions/10754665/change-column-types-in-a-huge-table but I am still unsure how I setup the batch in my case. Still unsure about the transaction log growth ... – Christian Dec 08 '19 at 21:58
2

@Christian, `SELECT INTO` is minimally logged in the `SIMPLE` and `BULK_LOGGED` recovery models. To avoid filling the log in the `FULL` model, you could load in batches with INSERT...SELECT like Gordan's answer both with a limiting `WHERE` clause. – Dan Guzman Dec 09 '19 at 00:19

score 0 · Answer 2 · answered Dec 09 '19 at 03:34

The following on the Data Loading Performance Guide includes some discussion on the conditions for minimal logging.

If you can meet these (including setting the database to bulk-logged or simple for the duration, use of TABLOCK hint, etc) then you may be able to complete this without killing your transaction log.

Changing a datatype in a table without reaching the transaction log limits

2 Answers2