0

I have a source_table with 3 million rows in sql server. The primary key is the built in sql uniqueidentifier.

I want to copy all 3 million rows into 4 tables:

Table1 has some main information such as uniqueidentifier, book_title, book_author, book_price. Table2, Table3 and Table4 will all different columns but they will have the same uniqueidentifier primary key as Table1 and also that primary key will be a foreign key to Table1's uniqueidentifier primary key.

Copying from source_table takes a long time because each of Table1, Table2, Table3 and Table4 have 50 million rows. It is slow and I want to improve performance. My code is below. Does anyone have thoughts to improve performance even by a little bit? Every day the source_table is populated and I must reinsert into Table1, Table2, Table3 and Table4.

Thx for your suggestions.

insert into Table1 values (UID, book_title, book_author, book_price)
select values (@UID, @title, @author, @price)
from source_table

insert into Table2 values (UID, col2, col3, col4)
select values (@UID, @col2value, @col3value, @col4value)
from source_table

insert into Table3 values (UID, col2, col3, col4)
select values (@UID, @col2value, @col3value, @col4value)
from source_table
James Rodriguez
  • 119
  • 3
  • 10
  • 2
    Why do you have to make 4 copies of 50 million rows every day? As a side note, if you have a uniqueidentifier as your primary key I hope you have a clustered index on something else as the fragmentation will reach almost 100% with as few a couple thousand rows. – Sean Lange Nov 06 '15 at 14:53
  • Possible duplicate of [copying a huge table data into another table in sql server](http://stackoverflow.com/questions/5296106/copying-a-huge-table-data-into-another-table-in-sql-server) – Ondrej Svejdar Nov 06 '15 at 14:53
  • Also, on DBA Stack, see if this helps you James: http://dba.stackexchange.com/questions/99367/insert-into-table-select-from-table-vs-bulk-insert – Seamus Nov 06 '15 at 14:56
  • Every day, new data is sent to the sourceTable and it must be copied to the 4 tables. The source table has 200 columns and send it to the 4 tables to be more organized, some cols in table1, others cols in Table2, etc to be more organized. After the copying, the sourceTable is truncated to allow for the next days load. The primary keys are non clustered. – James Rodriguez Nov 06 '15 at 14:58
  • Those `insert` statements are invalid. If you use `select` you can't use `values` - and if it _did_ work it would insert the same values over and over again, wouldn't it` (because the content of the variables doesn't change). And where do those variables come from in the first place? –  Nov 06 '15 at 14:59
  • Hi Sean, please explain 'As a side note, if you have a uniqueidentifier as your primary key I hope you have a clustered index on something else as the fragmentation will reach almost 100% with as few a couple thousand rows' that may be important. Thx – James Rodriguez Nov 06 '15 at 14:59
  • thanks horse with no name, you are right that is what i had, i updated my code. – James Rodriguez Nov 06 '15 at 15:02
  • That will be huge transactions. Try smaller chunks. – jarlh Nov 06 '15 at 15:04
  • Seamus: bulk insert copies from a text file to tables. I want to copy from a table to other tables. Jarlh: what do you mean by smaller chunks? I can certainly try that. Thx – James Rodriguez Nov 06 '15 at 15:06
  • I tried ssis multicast to write into the 4 tables and it was still slow. Is the uniqueidentifier primary key and foreign key the problem? – James Rodriguez Nov 06 '15 at 15:08
  • 1
    @JamesRodriguez there are literally hundreds and hundreds of articles about the index fragmentation problems when using a guid column as the clustered index. Here is one of them. http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/ You can get away with it but you have to constantly defrag your index. It is generally better to use another column as the clustered index. – Sean Lange Nov 06 '15 at 15:15
  • Hi Sean Lange. Thx for your input. I have a NON clustered index on guid. The reason I have a pk on guid is because it is the only thing that is truly unique, i had no choice because no other columns are guaranteed to be unique. – James Rodriguez Nov 06 '15 at 16:02

1 Answers1

2

Try using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging (see MSDN article)

Minimal logging for this statement has the following requirements:

  • The recovery model of the database is set to simple or bulk-logged.
  • The target table is empty or is a nonempty heap.
  • The target table is not used in replication.
  • The TABLOCK hint is specified for the target table.

    -- Temporarily set the recovery model to BULK_LOGGED.
    ALTER DATABASE MyDB
    SET RECOVERY BULK_LOGGED;
    GO
    -- You May have to drop the clustered index here
    
    INSERT INTO INTO Table1 WITH (TABLOCK)
         (UID, book_title, book_author, book_price)
    SELECT UID, title, author, price)
    FROM source_table
    
    -- RECREATE CLUSTERED INDEX HERE
    
    -- Reset the recovery model.
    ALTER DATABASE MyDB
    SET RECOVERY FULL;
    GO
    

    *** NOW DO A FULL BACKUP

Steve Ford
  • 7,433
  • 19
  • 40
  • thanks for your input Steve Ford, I will try that. When you typed '-- recreate clustered index here'. The uniqueidentifier primary key I have is nonclustered. It is the built in sql server uniqueidentifier data type. Should I switch to clustered? – James Rodriguez Nov 06 '15 at 15:13
  • @JamesRodriguez As mentioned in other comments, having a clustered index on uniqueidentifier is generally a bad thing. Effectively, new keys are random, so any new row may be inserted anywhere in your existing pages, unless you use workarounds like `newsequentialid()`. There are some circumstances where this isn't awful, but it's unlikely to be great. What *is* your clustered index? We could do with seeing the definition of your tables... – Matt Gibson Nov 06 '15 at 15:31
  • @JamesRodriguez you should try dropping the primary key before the insert and recreating afterwards. There is no easy answer you should test with and without the indexes. – Steve Ford Nov 06 '15 at 15:45
  • hi Matt Gibson thx for your input, I have a NON-clustered index on the uniqueidentifier. – James Rodriguez Nov 06 '15 at 15:59