6

Why don't SQL support the upsert use case? I am not asking how to do in a particular db(*). What I want to know is why upsert is not a fundamental operation like insert and update. I mean it is a pretty straightforward use case right? I guess there must be some fundamental db first principle that gets broken when doing upsert or some technical challenge that the db engine faces when confronted with a upsert.

*. I know about the mysql syntax and the SQL Merge syntax. BTW even while using such db specific syntax you need to be careful about atomicity and locking. And using the merge syntax, it doesn't feel right having to create a psuedo table.

Edit: I am editing this to clarify that I am not asking an opinion. So I dont think this question should be blocked.

ad77
  • 96
  • 7
  • 3
    You should have asked Dr. E.F. Codd *before* he designed SQL ..... – marc_s Aug 05 '14 at 13:15
  • 2
    You may as well ask why `UPDATE` *is* considered a fundamental operation, since it can always be simulated by `INSERT` and `DELETE`, with appropriate consideration of ACIDity. – Damien_The_Unbeliever Aug 05 '14 at 13:17
  • Several dbms's now do support `merge` which handles the `upsert` use case. [btw: it wasn't Codd, it was Chamberlin & Boyce] – Paul Maxwell Aug 05 '14 at 13:30
  • 1
    @Damien_The_Unbeliever Update cannot be simulated by delete because the record can have references. Deleting and inserting would now require the entire reference chain to be deleted and recreated. – ad77 Aug 06 '14 at 10:16
  • @user2067753 As I mentioned in my question, the SQL merge is not designed for the single record use case. You would need a second table which you can bypass by using a temporary table. – ad77 Aug 06 '14 at 10:18
  • @ad77 - only in database systems that don't allow you to defer constraint checking until the end of the transaction. – Damien_The_Unbeliever Aug 06 '14 at 10:20
  • @marc_s I am interested in knowing of a design decision in the design of SQL. I am not asking for people's opinions. – ad77 Aug 06 '14 at 10:23
  • @Damien_The_Unbeliever Since the option of deferring the check is left to the programmer (schema designer ?), it makes sense for the update command to exist. Do you not agree? – ad77 Aug 06 '14 at 10:59
  • 1
    " ... knowing of a design decision .." then go to the source http://www.cs.uakron.edu/~echeng/db/sequel.pdf – Paul Maxwell Aug 06 '14 at 11:09
  • @user2067753 That does not have what I am asking. What I am asking is akin to "why no pointers in java". The right answer would be "to reduce language complexity". Your response above is like "see the java language reference". – ad77 Aug 06 '14 at 11:15
  • I think you also need to drop the prejudice against needing a table. Tables are fundamental units to SQL, why would you compare records in tables to some other structure? Merge can be used for any number of "upserts", and, as far as I know there is no restriction to say you cannot use it for a singleton. (ps. a temp table is still a table, perhaps you were thinking of a CTE?) – Paul Maxwell Aug 06 '14 at 11:16
  • Correct, I am pointing you to documentation because it actually discusses the reasons for how it was designed. Have you read it? It will give you more insight than I can provide here that's for sure. I'm afraid I cannot stipulate the specific design decision for absence of an "upsert" (a term that didn't exist in 1974) and you will be lucky indeed to find the person(s) who did make that decision otherwise it is conjecture. – Paul Maxwell Aug 06 '14 at 11:21
  • I dont have a prejudice against virtual tables. But please see the syntax for upsert in mysql here [link](dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html). Vs the sql server way [link](http://stackoverflow.com/questions/2479488/syntax-for-single-row-merge-upsert-in-sql-server) and in oracle [link](http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table). Which do you think is cleaner and intuitive. – ad77 Aug 06 '14 at 11:26
  • @user2067753 Dude, that is a language ref manual. Please atleast read the abstract on the first page. Just referring the K&R C ref book, doesnt make one an expert in C. A lot of other db concepts didnt exist in 1974 but they do now. All I want to know why is implementing upsert so difficult. If you do not know the answer why are you replying. – ad77 Aug 06 '14 at 11:39
  • "*All I want to know why is implementing upsert so difficult*" - [Why is UPSERT so complicated?](https://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/) –  Jun 24 '20 at 15:26

1 Answers1

3

Because it isn't easily handable, both acid and syntax-wise.
The conditions for update if exists isn't clear.

For example, replace "insert into" with upsert in the below query

insert into t_something 
select * from t_whatever

No foreign keys, no primary keys.
How do you want to update ?
Would the where condition be for the select, or for the update ?

Ultimately, you have to write the condition, and then you can just as well do a "update/insert if"...

Usually, when you're asking yourself the upsert question, you're handling inserting/updating wrong.
You're thinking in object terms instead of set terms.

You want to loop through an array of objects, and insert if count(*) on exists is 0 else update.

That's how object-oriented imperative programming works, but that's not how SQL works.

In SQL, you operate with a SET.
You can easily do a inner join - update on the SET
and a left join where null insert on the same SET.
That's just as comfortable as a merge, and a lot more readable plus simpler to debug.
And it might well be faster.

You can already ensure it's all atomic by putting update & insert into a transaction.

Thinking of upsert, which idiotism do you want next ? "UpSertLeteTrunc" ? MerDel ?
Or perhaps truncsert ?

There are more important things to do, by far.

This is how I do Upsert with MERGE on SQL-Server:

-- How to create the XML 
/*
DECLARE @xml XML 
SET @xml = ( SELECT (SELECT * FROM T_Benutzer FOR XML PATH('row'), ROOT('table'),  ELEMENTS xsinil) AS outerXml )
-- SELECT @xml 
*/


DECLARE @xml xml 
SET @xml = '<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
    <PLK_UID>7CA68E6E-E998-FF92-BE70-126064765EAB</PLK_UID>
    <PLK_Code>A2 Hoch</PLK_Code>
    <PLK_PS_UID>6CF3B5AB-C6C8-4A12-8717-285F95A1084B</PLK_PS_UID>
    <PLK_DAR_UID xsi:nil="true" />
    <PLK_Name_DE>Mit Legende</PLK_Name_DE>
    <PLK_Name_FR>Avec Légende</PLK_Name_FR>
    <PLK_Name_IT>Con Leggenda</PLK_Name_IT>
    <PLK_Name_EN>With Legend</PLK_Name_EN>
    <PLK_IsDefault>0</PLK_IsDefault>
    <PLK_Status>1</PLK_Status>
  </row>
</table>'


DECLARE @handle INT  
DECLARE @PrepareXmlStatus INT  

EXEC @PrepareXmlStatus = sp_xml_preparedocument @handle OUTPUT, @XML


;WITH CTE AS 
(
    SELECT 
         PLK_UID
        ,PLK_Code
        ,PLK_PS_UID
        ,PLK_DAR_UID
        ,PLK_Name_DE
        ,PLK_Name_FR
        ,PLK_Name_IT
        ,PLK_Name_EN
        ,PLK_IsDefault
        ,PLK_Status
    FROM OPENXML(@handle, '/table/row', 2) WITH 
    (
         "PLK_UID" uniqueidentifier 'PLK_UID[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Code" character varying(10) 'PLK_Code[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_PS_UID" uniqueidentifier 'PLK_PS_UID[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_DAR_UID" uniqueidentifier 'PLK_DAR_UID[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Name_DE" national character varying(255) 'PLK_Name_DE[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Name_FR" national character varying(255) 'PLK_Name_FR[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Name_IT" national character varying(255) 'PLK_Name_IT[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Name_EN" national character varying(255) 'PLK_Name_EN[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_IsDefault" bit 'PLK_IsDefault[not(@*[local-name()="nil" and . ="true"])]'
        ,"PLK_Status" int 'PLK_Status[not(@*[local-name()="nil" and . ="true"])]'
    ) AS tSource 

    WHERE (1=1) 
    
    -- AND NOT EXISTS 
    -- (
    --  SELECT * FROM T_VWS_Ref_PdfLegendenKategorie 
    --  WHERE T_VWS_Ref_PdfLegendenKategorie.PLK_UID = tSource.PLK_UID 
    --)
)
-- SELECT * FROM CTE     
MERGE INTO T_VWS_Ref_PdfLegendenKategorie AS A 
USING CTE ON CTE.PLK_UID = A.PLK_UID 
WHEN MATCHED 
    THEN UPDATE
        SET  A.PLK_Code = CTE.PLK_Code
            ,A.PLK_PS_UID = CTE.PLK_PS_UID
            ,A.PLK_DAR_UID = CTE.PLK_DAR_UID
            ,A.PLK_Name_DE = CTE.PLK_Name_DE
            ,A.PLK_Name_FR = CTE.PLK_Name_FR
            ,A.PLK_Name_IT = CTE.PLK_Name_IT
            ,A.PLK_Name_EN = CTE.PLK_Name_EN
            ,A.PLK_IsDefault = CTE.PLK_IsDefault
            ,A.PLK_Status = CTE.PLK_Status
WHEN NOT MATCHED BY TARGET THEN 
INSERT 
(
     PLK_UID
    ,PLK_Code
    ,PLK_PS_UID
    ,PLK_DAR_UID
    ,PLK_Name_DE
    ,PLK_Name_FR
    ,PLK_Name_IT
    ,PLK_Name_EN
    ,PLK_IsDefault
    ,PLK_Status
)
VALUES
(
     CTE.PLK_UID
    ,CTE.PLK_Code
    ,CTE.PLK_PS_UID
    ,CTE.PLK_DAR_UID
    ,CTE.PLK_Name_DE
    ,CTE.PLK_Name_FR
    ,CTE.PLK_Name_IT
    ,CTE.PLK_Name_EN
    ,CTE.PLK_IsDefault
    ,CTE.PLK_Status
)
-- WHEN NOT MATCHED BY SOURCE THEN DELETE
;


EXEC sp_xml_removedocument @handle 
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • 1
    Sorry I think you didnt get the question or I didnt do a good job of asking. I am not asking how to do this scenario. But since so far in my programming career I have seen multiple use cases for the upsert (just google for it), I wonder why it doesn't already exist. As for your response, I guess you are not familiar with the mysql syntax for this. Just have a look [Link]http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html .It does not matter whether you are operating on a single row or a set. If the key already exists then update, else insert. – ad77 Aug 06 '14 at 10:50
  • @ad77: Yes, but they use ON DUPLICATE KEY, which will fail if there is no key, or the key is not the only criteria. If it wants to be useful, you'll need to provide the update criteria. And with such a most-simple example, I dont' see much of a difference to IF EXISTS(SELECT * FROM T_Whatever WHERE a = @a) UPDATE T_Whatever SET c = @c+1 ELSE INSERT INTO table (a,b,c)VALUES (1,2,3) – Stefan Steiger Aug 06 '14 at 12:08
  • pls dont take this any other way. Just google for "approaches to upsert" to find out what is wrong with this solution and what many others have been trying. Here's a link to get you started. [link](http://www.the-art-of-web.com/sql/upsert/). – ad77 Aug 06 '14 at 14:31
  • Also read this one.[Link](http://samsaffron.com/blog/archive/2007/04/04/14.aspx) – ad77 Aug 06 '14 at 14:37
  • 1
    Postgres adds support for upsert [link](https://news.ycombinator.com/item?id=9509870) .. Just commenting for reference. – ad77 May 08 '15 at 05:16
  • "*Ultimately, you have to write the condition, and then you can just as well do a "update/insert if"...*" no, you can not as that is prone to race conditions. –  Jun 24 '20 at 15:20
  • "*Because it isn't easily handable, both acid and syntax-wise*" and yet, Postgres managed to do just that –  Jun 24 '20 at 15:24
  • @a_horse_with_no_name: Yep, agree with PG. But it should not be prone to race conditions, if you put it into a transaction. Anyway, added how I do it with merge. Note that last time I testet, PG didn't support using CTEs in the MERGE clause. Maybe I did something wrong, or maybe newer PG versions can. But last time I tested, I couldn't get it to work in time for quitting time. If CTE aren't supported, you have to get your data twice ==> Code duplication, unless you want to create a table-valued function. – Stefan Steiger Jun 24 '20 at 15:38
  • I you do insert/update - even in a transaction - other transactions could insert the same rows and your insert would either fail or generate a duplicate row - that **is** a race condition. The same is true for the MERGE statement. That's why Postgres didn't implement MERGE but their own syntax, because MERGE isn't safe against race conditions - at least not in the (rather unclear) way the SQL standard defines it. –  Jun 24 '20 at 15:40
  • @a_horse_with_no_name: Hmm, that is not good, an insert failure shouldn't happen. That one update overwrites another is less of a surprise. It's a good thing then that the company I work for has a database with a low amount of writes ... – Stefan Steiger Jun 24 '20 at 16:01
  • That you can have tables with no primary keys is no point in why upsert is not available as a primary sql command. You just give an error when trying to upsert on a table wihtout primary keys. Upserts need to be atomic to get best performance on the locking etc. It's way more complicated than just doing a simple merge. So yes, it should be available at the core for an SQL server. Bulk upserts should also be available. – Atle S Feb 25 '21 at 13:29