3

Can someone help me understand how TSQL merge actually works? and help me understand when the UPDATE/INSERT will be faster and when MERGE will be faster?

I know its a very vague question and not related to one particular case I am working on. Hope this is allowed on StackOverflow.

If you can give me relevant links that will work as well. Thanks.

Thanks for pointing me to: SQL Server : MERGE performance but I feel this is more related to the indexes then merge and hence my question is not a duplicate of this.

Community
  • 1
  • 1
Deep Kalra
  • 1,418
  • 9
  • 27
  • possible duplicate of [SQL Server : MERGE performance](http://stackoverflow.com/questions/18143533/sql-server-merge-performance) – Mithrandir Aug 21 '15 at 06:50
  • Obviously MERGE will have it's overhead because there will be some threads working on implementing it. But ultimately it boils down how convenient it makes the life of a developer. – SouravA Aug 21 '15 at 06:51
  • exactly, it making my life easier in writing query. and I don't have a lot many rows in the db I am working on just a million or so. Not able to decide if I should change it to Update and Insert. – Deep Kalra Aug 21 '15 at 06:53
  • @SouravA Can you point me to some blog or something explaining how additional threads related to merge are created? and how is there an overhead? – Deep Kalra Aug 21 '15 at 06:57
  • 1
    Well I am no expert, but I am guessing that if you use merge to update or delete, those will happen no matter what, there is no rocket science there. It's not that merge will make the inserts or updates `more efficient`. Obviously the compiler will first have to go through the merge construct, understand what it's saying and then do the operation. So, omitting merge will make code faster, howsoever minuscule it might be. But reason I would not care about that is because merge allows me to maintain and scale my code much better. Sorry, I don't have any resources on this, try googling. – SouravA Aug 21 '15 at 07:03
  • 4
    @Mithrandir - If you see closely, the accepted answer to the question tells nothing about the performance of `MERGE` but of the code in general. I don't think this is duplicate, but for this question SO is not the best platform. – SouravA Aug 21 '15 at 07:13

1 Answers1

1

The merge implementations does a full outer join on source and target. Depending on what side-effecting clauses you specified this can be reduced to easier joins such as left or inner joins.

From the join result there are compute scalar operators that compute what action is supposed to happen and what values are going to be used. This result is streamed into an operator that does the writes.

This is very simplified. The difference to normal DML is almost zero if you only specify one side-effecting clause. This shows that merge does not have an inherent performance disadvantage.

In fact it has an advantage in the sense that it needs to do pass over the data only once. Often, merge is faster than multiple statements doing the same thing.

  • The optimizer can see all DML at once
  • One pass over the data instead of one per statement
  • All index writes are sorted by index key. It's better to do this once instead of multiple times
  • Per-statement overhead only once

It can use a little more CPU if you use merge in a way that does not benefit from any of these points.

Performance really depends on the schema, the shape of the merge and on the data. I can construct you cases where merge is slightly slower and cases where it is significantly faster.

usr
  • 168,620
  • 35
  • 240
  • 369