The merge implementations does a full outer join
on source and target. Depending on what side-effecting clauses you specified this can be reduced to easier joins such as left or inner joins.
From the join result there are compute scalar operators that compute what action is supposed to happen and what values are going to be used. This result is streamed into an operator that does the writes.
This is very simplified. The difference to normal DML is almost zero if you only specify one side-effecting clause. This shows that merge does not have an inherent performance disadvantage.
In fact it has an advantage in the sense that it needs to do pass over the data only once. Often, merge is faster than multiple statements doing the same thing.
- The optimizer can see all DML at once
- One pass over the data instead of one per statement
- All index writes are sorted by index key. It's better to do this once instead of multiple times
- Per-statement overhead only once
It can use a little more CPU if you use merge in a way that does not benefit from any of these points.
Performance really depends on the schema, the shape of the merge and on the data. I can construct you cases where merge is slightly slower and cases where it is significantly faster.