0

I have been asked to look into a manual process that one of my colleagues is completing every now and again.

He sometimes needs to add a new column onto a large table (200 million rows), it is taking him more than 1 hour to do this. Before you ask, yes, the columns are nullable but sometimes the new column will have 90% data in it.

Instead of adding a new column to the existing table, he...

  1. Creates a new table
  2. Select (*) from old table (inserts into new)
  3. Adds the new column as part of his script

Then he deletes the old table and renames the new table back to the original, adds index and then compresses. He says it much quicker like that.

If this is the best way then I will try and write SSIS package to try and make the process more seamless

Any advice is welcome!

Thanks

Easy987us
  • 21
  • 4
  • Does the source table get updated ? – Liya Tansky Feb 09 '17 at 11:32
  • Yes - the source table is updated monthly right now, but will change to weekly soon – Easy987us Feb 09 '17 at 11:33
  • Why are you adding a new column so often? This could probably be better solved by slightly changing your data structure into something more normalised. Can you post the table definition and reasons for the new columns? – iamdave Feb 09 '17 at 11:33
  • Hopefully I can answer your question iamdave but I'm no expert.l The original table was 400 million records, the new table is somewhat an aggregation. I think what they are trying to do here is have a table with something like 50 columns, but the table is created in literally 40 - 50 steps with different data sources all over the place. So in fact my original post was incorrectly slightly. They repeat the step of deleting and recreating the table multiple times every month. – Easy987us Feb 09 '17 at 11:36
  • For some reason I am unable to edit my previous comment. I have asked some more questions, there are actually 14 tables with about 500 columns. The first table is used to created the additional 13 columns. They all have the same primary key but has been split into lots of tables due to the sheer number of columns as performance is literally rubbish if trying to write against 1 table with 200 million records and 500 columns. – Easy987us Feb 09 '17 at 11:53
  • Which version of SQL Server? This technique became obsolete around version 2000 when MS implemented a feature to add columns directly. – Disillusioned Feb 09 '17 at 12:54
  • Possible duplicate of [Add a column with a default value to an existing table in SQL Server](http://stackoverflow.com/questions/92082/add-a-column-with-a-default-value-to-an-existing-table-in-sql-server) – Disillusioned Feb 09 '17 at 12:58
  • Possible duplicate of [How do I add a column to large sql server table](http://stackoverflow.com/questions/1645215/how-do-i-add-a-column-to-large-sql-server-table) – SqlZim Feb 09 '17 at 13:38

2 Answers2

0

creating a new table structure and moving all the data to that table and delete the prior table is a good way just for a few data,you can do it by wizard in SQL Server.
but it is the worst way for solving this problem(millions of data).

for large amount of data (millions of records) you should use "Alter Table".

Alter Table MyTable
ADD NewColumn nvarchar(10) null

the new column will add to the table as the last column. if you use this script it takes less that one second because all data will not moving,you just add a new column in to the table.

but if you use the wizard method as you mentioned with millions of data records it takes hours.

Ali Coder
  • 49
  • 6
0

as Ali says

alter Table MyTable ADD NewColumn nvarchar(10) null

but then to fill in 90% of data. As he has a table already with it in and the key he's joining on in the copy so this is all he needs:

UPDATE MyTable
SET [NewColumn] = b.[NewColumn]
FROM MyTable a INNER JOIN NewColumnTable b ON a.[KeyField]= b.[KeyField]

would be a lot quicker. You could do it in SSIS but if this happens a lot then not really worth it for a few lines of SQL.

Jim
  • 569
  • 6
  • 15