Insert million records through select in SQL Server

Question

INSERT INTO CCP_D2
                (CONT_MAS_SID,
                 COMP_M_SID,
                 ITM_M_SID)
    SELECT DISTINCT CM.CONT_MAS_SID,
                    CMP_MAS.COMP_M_SID,
                    IM.ITM_M_SID
    FROM   CONT_MAS CM
           INNER JOIN CFP_C CCT
                   ON CM.CONT_MAS_SID = CCT.CONT_MAS_SID
                      AND CM.IN_STATUS <> 'D'
                      AND CCT.IN_STATUS <> 'D'
           INNER JOIN CFP_C_DET CCD
                   ON CCT.CFP_C_SID = CCD.CFP_C_SID
           INNER JOIN COMP_M CMP_MAS
                   ON CMP_MAS.COMP_M_SID = CCD.COMP_M_SID
                      AND CMP_MAS.IN_STATUS <> 'D'
           INNER JOIN IFP_C IFP_CONT
                   ON IFP_CONT.CFP_C_SID = CCT.CFP_C_SID
                      AND IFP_CONT.IN_STATUS <> 'D'
                      AND CM.CONT_MAS_SID = IFP_CONT.CONT_MAS_SID
           INNER JOIN IFP_C_DET ICD
                   ON IFP_CONT.IFP_C_SID = ICD.IFP_C_SID
           INNER JOIN ITM_M IM
                   ON IM.ITM_M_SID = ICD.ITM_M_SID
                      AND IM.IN_STATUS <> 'D'
    WHERE  NOT EXISTS (SELECT 1
                       FROM   CCP_D CD
                       WHERE  CD.CONT_MAS_SID = CM.CONT_MAS_SID
                              AND CD.COMP_M_SID = CMP_MAS.COMP_M_SID
                              AND CD.ITM_M_SID = IM.ITM_M_SID)

--Number of records returned= 209519554

enter image description here

I have an sql that returns nearly 20 Million records. I want to insert the result in to a table. It is working fine with less number of records but when the number of records reach million then it takes around 1 hr 30 mins to complete even though I have proper indexes for all joining columns.

I tried the following things

Splitted to mutiple batch insert using while loop but not working in this case.
Created all recommended indexes but performance is not improved as expected.

Note: I don't prefer partition since i'm using sql server standard edition.

Could you please suggest any way to improve the performance of this query.

http://stackoverflow.com/questions/13722014/insert-2-million-rows-into-sql-server-quickly — mohan111, May 05 '15 at 06:03
@mohan: Thanks for sharing the link, I am not looking for bcp or bulk insert since I have to insert the result set of the sql query to a table. — StackUser, May 05 '15 at 06:05
http://dba.stackexchange.com/questions/94018/optimizing-queries-for-25-million-rows — mohan111, May 05 '15 at 06:10
DECLARE BatchSize INT = 10000 WHILE 1 = 1 BEGIN INSERT INTO [dbo].[Destination] WITH (TABLOCK) SELECT TOP(BatchSize) s.columns FROM [dbo].[SOURCE] s WHERE NOT EXISTS (SELECT 1 FROM dbo.Destination WHERE ID = s.ID) IF ROWCOUNT < BatchSize BREAK END — StackUser, May 05 '15 at 06:19
@mohan: I'm not able to implement the solution you posted. Could you please share me any ideas. — StackUser, May 05 '15 at 06:44
DISTINCT is a horrible thing if you have lot of records! because the quey engine then will have to sort the whole result! do you have duplicates when you run the query without DISTINCT? — CeOnSql, May 05 '15 at 07:03
You will have to identify the bottleneck(s) for this query. Inserting 200 million rows will never be a quick operation - One the one hand it belongs to the query (as i said distinct, wrong/missing index, ..), on the other hand also your hardware will have influence on query performance (Network, CPU, Memory, SSD/HDD Performance) — CeOnSql, May 05 '15 at 07:45
Post the execution plan. All commentary so far was just random guesses. The plan will show what's expensive. — usr, May 05 '15 at 08:27
Please check the execution plan and suggest me any ideas to improve the sql since i'm searching a perfect solution for the past two days but unfortunately nothing works yet. — StackUser, May 05 '15 at 09:50
The cardinality estimates for the select are totally wrong. It estimates 1 row but 20 are coming back. Post the actual execution plan (in other words let this run for a few hours...). Meanwhile take a look at the estimates in the plan and tell us where you think they are most wrong. There must be places where millions of rows should flow through. — usr, May 05 '15 at 12:03

score 0 · Answer 1 · answered May 05 '15 at 06:15

One thing to keep in mind is that indexes and constraints make extra overhead when you insert row in table, because index structure must be reorganized to place new value into index pages. Also, constraints are mechanism which evaluate every row over predefined expression.

If you want to achieve faster import into table, especially in case of large amount of data, disable all constraints and indexes , and then re-enable them after import process.

Of course, you must be sure that your new rows will not destruct database consistency and integrity.

If you disable primary key constraint, and thus clustered index, you must know that foreign keys are also disabled automatically. When enable clustered index, or primary key constraint, foreign key constraints are not automatically enabled, so you must enable them manually.

The problem is select query, it only takes more than an hour to fetch records. Even I planned to drop and recreate indexes but the same table is referred in the not exists clause. — StackUser, May 05 '15 at 06:19

score -1 · Answer 2 · answered May 05 '15 at 06:17

-1

This may help you:

http://arsalantamiz.blogspot.com/2008/05/how-to-increase-performance-of-loading_13.html

https://shahanayyub.wordpress.com/2014/03/30/how-to-load-large-dataset-in-datagridview/

hope this may help you out with for what you are looking for.

answered May 05 '15 at 06:17

Adil Iqbal

1
1
5

I have to implement this in sql server stored procedure. I don't think the given solution suit for me. – StackUser May 05 '15 at 06:21

Insert million records through select in SQL Server

2 Answers2