SQL Merge vs Check and Insert/Update in Java

Question

I have an Java(Spring) REST API endpoint where I get 3 data inputs and I need to Insert in the oracle database based on some unique ID using JDBCTemplate. But just to be sure something doesn't break, I want have a check first if I need to insert or just update.

1st Approach

Make a database call with a simple query like

 SELECT COUNT(*) FROM TABLENAME WHERE ID='ABC' AND ROWNUM=1

And based on the value of count, make a separate Database call for Insert or Update. (count would never exceed 1)

2nd Approach

Make one single MERGE query hit using jdbctemplate.update() that would look like

MERGE INTO TABLENAME 
USING DUAL ON ID='ABC'
WHEN MATCHED THEN UPDATE 
    SET COL1='A', COL2='B'
    WHERE ID='ABC'
WHEN NOT MATCHED THEN
    INSERT (ID, COL1, COL2) VALUES ('ABC','A','B')

Based on what I read on different sites, using MERGE is a bit more costly in terms of CPU reads based on an experiment on this site. But they have done it for purely for DB script use where they do it with 2 tables and my context of use is via API call and using DUAL.

I also read on this question that MERGE could result in ORA-0001: unique constraint and some concurrency issue.

I want to do this on a table on which some other operation is possible at the same time for a different row and a very very small chance for the same row value. So I want to know which approach to follow for such use case and I know this might be a common one but I could not find answer to what I'm looking for anywhere. I want to know the performance/reliability of both approach.

Your options 3 and 4 are: 3) always update first, if no rows updated -> insert. 4) Always insert first, if it fails for duplicate row -> update. Everything of course depends highly on what kind of update / insert pattern you have. And if the amount of calls isn't that big, just do what you feel the easiest. — James Z, Aug 14 '20 at 11:24

diziaq · Accepted Answer · 2020-08-17T16:05:04.763

Looking at the code running in concurrent sessions environment, after each atomic statement we need to ask "what if another session have just broken our assumption?" and make adjustments according to that.

Option 1. Count and decide INSERT or UPDATE

declare
  v_count int;
begin
  SELECT count(1) INTO v_count FROM my_table WHERE ...;

  IF v_count = 0 THEN
    -- what if another session inserted the same row just before this point?
    -- this statement will fail
    INSERT INTO my_table ...;
  ELSE
    UPDATE my_table ...;
  END IF;
end;

Option 2. UPDATE, if nothing is updated - INSERT

begin
  UPDATE my_table WHERE ...;

  IF SQL%COUNT = 0 THEN
    -- what if another session inserted the same row just before this point?
    -- this statement will fail
    INSERT INTO my_table ...;
  END IF;
end;

Option 3. INSERT, if failed - UPDATE

begin
  INSERT INTO my_table ...;
exception when DUP_VAL_ON_INDEX then
  -- what if another session updated the same row just before this point?
  -- this statement will override previous changes
   
  -- what if another session deleted this row?
  -- this statement will do nothing silently - is it satisfactory?
   
  -- what if another session locked this row for update?
  -- this statement will fail  

  UPDATE my_table WHERE ...;  
end;

Option 4. use MERGE

 MERGE INTO my_table
    WHEN MATCHED THEN UPDATE ...
    WHEN NOT MATCHED THEN INSERT ...
    
 -- We have no place to put our "what if" question,
 -- but unfortunately MERGE is not atomic,
 -- it is just a syntactic sugar for the option #1

Option 5. use interface for DML on my_table

 -- Create single point of modifications for my_table and prevent direct DML.
 -- For instance, if client has no direct access to my_table,
 -- use locks to guarantee that only one session at a time
 -- can INSERT/UPDATE/DELETE a particular table row.

 -- This could be achieved with a stored procedure or a view "INSTEAD OF" trigger.

 -- Client has access to the interface only (view and procedures),
 -- but the table is hidden.
     my_table_v                -- VIEW AS SELECT * FROM my_table
     my_table_ins_or_upd_proc  -- PROCEDURE (...) BEGIN ...DML on my_table ... END; 

  

 PROCEDURE my_table_ins_or_upd_proc(pi_row  my_table%ROWTYPE) is  
   l_lock_handle CONSTANT VARCHAR2(100) := 'my_table_' || pi_row.id;
   -- independent lock handle for each id allows
   -- operating on different ids in parallel
 begin
   begin
     request_lock(l_lock_handle);
     
     -->> this code is exactly as in option #2
     
     UPDATE my_table WHERE ...;
   
     IF SQL%COUNT = 0 THEN
       -- what if another session inserted the same row just before this point?
       -- NOPE it cannot happen: another session is waiting for a lock on the line # request_lock(...)
       INSERT INTO my_table ...;
     END IF;
     
     --<<
     
   exception when others then
     release_lock(l_lock_handle);
     raise;
   end;
   
   release_lock(l_lock_handle); 
 end;

Not going too deep into low level details here, see this article to find out how to use locks in Oracle DBMS.

Thus, we see that options 1,2,3,4 have potential problems that cannot be avoided in a general case. But they could be applied if the safety is guaranteed by domain rules or a particular design conventions.

Option 5 is bulletproof and fast as it is relies on the DBMS contracts. Nevertheless, this will be a prize for clean design, and it cannot be implemented if my_table is barenaked and clients rely on straightforward DML on this table.

I believe that performance is less important than data integrity, but let's mention that for completeness. After proper consideration it is easy to see that the options order according to the "theoretical" average performance is:

2 -> 5 -> (1,4) -> 3

Of course, the step of performance measuring goes after obtaining at least two properly working solutions, and should be done exclusively for a particular application under a given workload profile. And that is another story. At this moment no need to bother about theoretical nanoseconds in some synthetic benchmarks.

I guess currently we see that there will be no magic. Somewhere in the application it is required to ensure that every id inserted into my_table is unique.

If id values do not matter (95% of cases) - just go for using a SEQUENCE.

Otherwise, create a single point of manipulation on my_table (either in Java or in DBMS schema PL/SQL) and control the uniqueness there. If the application can guarantee that at most a single session at a time manipulates data in my_table, then it is possible to just apply the option #2.

Thanks. Option #3 works out best for me because it won't face the challenges you mentioned and also it is provided that this endpoint is for Update only. The update check is just to make sure nothing breaks. — Ayush Kumar, Aug 17 '20 at 12:46

SQL Merge vs Check and Insert/Update in Java

1 Answers1