1

I am using the MERGE feature to insert data into a table using a bulk import table as source. (as described here)

This is my query:

DECLARE @InsertMapping TABLE (BulkId int, TargetId int);
MERGE dbo.Target T
USING dbo.Source S
ON 0=1 WHEN NOT MATCHED THEN 
    INSERT (Data) VALUES (Data)
OUTPUT S.Id BulkId, inserted.Id INTO @InsertMapping; 

When evaluating the performance by displaying the actual execution plan, I saw that there is a high cost sorting done on the primary key index. I don't get it because the primary key should already be sorted ascending, there shouldn't be a need for additional sorting.

Execution Plan!

Because of this sort cost the query takes several seconds to complete. Is there a way to speed up the inserting? Maybe some index hinting or additional indices? Such an insert shouldn't take that long, even if there are several thousand entries.

Community
  • 1
  • 1
Danielku15
  • 1,490
  • 1
  • 13
  • 29
  • Why is your merge condition 1=0? How do you match existing and new rows? – OzrenTkalcecKrznaric Jul 24 '13 at 08:57
  • If you are just trying to insert data, why not use the INSERT statement (it also supports the OUTPUT clause)? This MERGE ... ON 0=1 condition doesn't look good. If you actually have a merge condition that does happen sometimes (unlike 0=1), could you tell us about it ? – Jonathan P. Jul 24 '13 at 09:08
  • 2
    @JonathanP. - The reason for using `MERGE` in this case is to access the additional columns in the `OUTPUT` clause. This is not supported with `INSERT` [as described here](http://stackoverflow.com/questions/5365629/using-merge-output-to-get-mapping-between-source-id-and-target-id). Danielku15: Could you upload the XML for the actual execution plan? – Martin Smith Jul 24 '13 at 10:41
  • Oh ok, thank you Martin Smith, didn't know about that. Quite useful to retrieve generated Ids! – Jonathan P. Jul 24 '13 at 11:37
  • I uploaded a "cleaned" execution plan : https://gist.github.com/Danielku15/420f2a7905f54a0dcc70 This statement inserted 71039 rows to the target table. – Danielku15 Jul 25 '13 at 10:24
  • So did you try the approach in my answer on your data? Did it get rid of the sort? Did it make things better or worse? – Martin Smith Aug 01 '13 at 09:56
  • Yes, I was able to get rid of the sort. The insert performance was slightly improved that change. Thanks – Danielku15 Aug 07 '13 at 08:50

1 Answers1

4

I can reproduce this issue with the following

CREATE TABLE dbo.TargetTable(Id int IDENTITY PRIMARY KEY, Value INT)
CREATE TABLE dbo.BulkTable(Id int IDENTITY PRIMARY KEY, Value INT)

INSERT INTO dbo.BulkTable
SELECT TOP (1000000) 1
FROM   sys.all_objects o1, sys.all_objects o2

DECLARE @TargetTableMapping TABLE (BulkId   INT,TargetId INT);

MERGE dbo.TargetTable T
USING dbo.BulkTable S
ON 0 = 1
WHEN NOT MATCHED THEN 
  INSERT (Value)
  VALUES (Value)
OUTPUT S.Id AS BulkId,
       inserted.Id AS TargetId
INTO @TargetTableMapping; 

This gives a plan with a sort before the clustered index merge operator.

Plan

The sort is on Expr1011, Action1010 which are both computed columns output from previous operators.

Expr1011 is the result of calling the internal and undocumented function getconditionalidentity to produce an id column for the identity column in TargetTable.

Action1010 is a flag indicating insert, update, delete. It is always 4 in this case as the only action this MERGE statement can perform is INSERT.

The reason the sort is in the plan is because the clustered index merge operator has the DMLRequestSort property set.

enter image description here

The DMLRequestSort property is set based on the number of rows expected to be inserted. Paul White explains in the comments here

[DMLRequestSort] was added to support the ability to minimally-log INSERT statements in 2008. One of the preconditions for minimal logging is that the rows are presented to the Insert operator in clustered key order.

Inserting into tables in clustered index key order can be more efficient anyway as it reduces random IO and fragmentation.

If the function getconditionalidentity returns generated identity values in ascending order (as would seem reasonable) then the input to the sort will already be in the desired order. The sort in the plan would in that case be logically redundant, (there was previously a similar issue with unnecessary sorts with NEWSEQUENTIALID)

It is possible to get rid of the sort by making the expression a bit more opaque.

DECLARE @TargetTableMapping TABLE (BulkId   INT,TargetId INT);
DECLARE @N BIGINT = 0x7FFFFFFFFFFFFFFF

MERGE dbo.TargetTable T
USING (SELECT TOP(@N) * FROM dbo.BulkTable) S
ON 1=0
WHEN NOT MATCHED THEN 
  INSERT (Value)
  VALUES (Value)

OUTPUT S.Id AS BulkId,
       inserted.Id AS TargetId
INTO @TargetTableMapping; 

This reduces the estimated row count and the plan no longer has a sort. You will need to test whether or not this actually improves performance though. Possibly it might make things worse.

Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845