Difference between performing INSERT INTO vs Using MERGE INTO

Question

I started working on a project that already has some scripts running and while I was looking some of the scripts that have already been done I encountered an scenario that looks something like this.

DROP TABLE IF EXISTS #Persons
SELECT PersonId = 1, Firstname = 'Joaquin', LastName = 'Alvarez'
INTO #Persons
DECLARE @inserted table (PersonId INT, Firstname VARCHAR(50), Lastname VARCHAR(50))

MERGE INTO dbo.Persons P USING #Persons TP ON 1 = 0 -- Forcing mismatch
        WHEN NOT MATCHED THEN
        INSERT 
        (
             PersonId,
             Firstname,
             Lastname
        )
        VALUES 
        (
             TP.PersonId,
             TP.Firstname,
             TP.LastName
        )

OUTPUT INSERTED.PersonId, INSERTED.Firstname, INSERTED.LastName
INTO @inserted;

My Question here is why they would use the merge into and force the mismatch just to perform an insert, they could have done the same without it with something like this.

DROP TABLE IF EXISTS #Persons
SELECT PersonId = 1, Firstname = 'Joaquin', LastName = 'Alvarez'
INTO #Persons
DECLARE @inserted table (PersonId INT, Firstname VARCHAR(50), Lastname VARCHAR(50))

INSERT INTO Persons
OUTPUT INSERTED.PersonId, INSERTED.Firstname, INSERTED.LastName
INTO @inserted
VALUES (1, 'Joaquin', 'Alvarez')

The first option is faster than the last one? or they're bot the same? This is the first time I see the merge into used this way.

I, personally, don't see the need for the `MERGE` here. *If*, however, the `MERGE` was outputting columns (in the `OUTPUT` clause) from objects other than `inserted`, it *would* make sense, but as it stands as the `MERGE` could be replicated with a `INSERT`. Perhaps it did in the past, or was meant to, and then didn't (and thus the `MERGE` statement was left). The only person who can truly answer why the wrote the above the way they did would be the person who wrote it. — Thom A, Mar 08 '21 at 12:10
Questions of the form "what was person X thinking when.." are not on topic for SO, sorry! — Caius Jard, Mar 08 '21 at 12:23
There is simply NO GOOD REASON to create and populate a temp table in this script, regardless of whether MERGE or INSERT is used. That alone tells me that the writer of this code tends to blindly apply patterns. — SMor, Mar 08 '21 at 12:30
@MatBailie Hey, I deleted it because I thought that as you mention, the question was due to a silly misstake and doesn't bring much value to have that kind of questions lying around SO. It's not like I dont appreciate the effort Gordon put into looking at my question, I just thought that due to the stupid nature of the question It will be closed sooner than later. Try to don't assume what people think and chill of a bit maybe? — Joaquín, Apr 29 '21 at 14:03
@Joaquín That's not how Stack Overflow works. Else there would be no rep scores, no upvotes, etc. Asking a question then deleting it when it's answered is simply rude. — MatBailie, Apr 29 '21 at 14:07

score 5 · Accepted Answer · answered Mar 08 '21 at 12:12

You would really need to ask the person who wrote the code. But I can think of two and a half reasons.

The first is that the original code was both an INSERT and UPDATE so the author used MERGE to handle the code. As the code was tested or as requirements changed, the person who wrote it realized that the UPDATE was not needed, but left the MERGE.

The half reason is that someone wrote the code expecting UPDATEs to be needed over time, so tried to future-proof the code.

The second reason is that the author may simply prefer MERGEs over UPDATEs and INSERTs because it is one statement that is more powerful than either of those individually. So, they simply always use MERGE.

Hey, thanks, I was wondering because Im not that experienced and believed that maybe the MERGE was faster for some reason. — Joaquín, Mar 08 '21 at 12:14
Merge should really be avoided in favor of sticking with update and insert; it has many bugs and edge-cases, many of which MS will not fix. — Stu, Mar 08 '21 at 12:43

Charlieface · Answer 2 · 2021-03-08T15:33:19.723

There are two big differences I can think of that may have caused the query to be written as MERGE:

1. MERGE allows column references in the OUTPUT clause that are not in the inserted table. Admittedly, the current query does not have that, but it is possible that other references were originally there or intended to be added.

2. Halloween Protection is much more efficient on MERGE.
Paul White goes into detail in his excellent article on this, but basically, when executing a INSERT/WHERE NOT EXISTS, this forces the optimizer to add a Table Spool to ensure correct results. A MERGE usually does not require this, as the optimizer can see what you are trying to do: fill the empty holes in the primary key.

I note that your query does not have a NOT EXISTS semantic, but again, it is possible that it was intended ast some point to have it.

As you state, #Person is entirely pointless here. Even with MERGE you can still do that from a constructed table, eg.

MERGE INTO dbo.Persons P
USING (
    SELECT *
    FROM (VALUES (1, 'Joaquin', 'Alvarez') ) AS v(PersonId, Firstname, LastName)
) TP ON 1 = 0 -- Forcing mismatch
        WHEN NOT MATCHED THEN
        INSERT (PersonId, Firstname, Lastname)
        VALUES (TP.PersonId, TP.Firstname, TP.LastName)
OUTPUT INSERTED.PersonId, INSERTED.Firstname, INSERTED.LastName
INTO @inserted;

Difference between performing INSERT INTO vs Using MERGE INTO

2 Answers2

Linked