How to merge rows + retrieve new and existing keys

Question

In an Oracle table (e.g. MYTABLE, with a numeric sequenced field as primary key), I have to insert several thousand of rows, but some of them are supposed to already exist in the table.

Naturally, I should try to use MERGE but I need, as well, to retrieve all created (when inserting) and existing (when updating) primary keys.

As well, it should be as fast as possible.

Is the following attempt (pseudo code) the only way to go? Thanks.

keys_list = empty array
for each row to merge
    do query 'SELECT PK_MYTABLE FROM MYTABLE WHERE PK_MYTABLE = '+row.pk_mytable
        ==> retrieve key
    if found then:
        add key to keys_list
    else:
        do query 'INSERT INTO MYTABLE (PK_MYTABLE, ...) VALUES (SEQ_MYTABLE.NEXTVAL, ...)'
        do query 'SELECT SEQ_MYTABLE.CURRVAL FROM DUAL' ==> retrieve key
        add key to keys_list

Related: http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table — Maxime Pacary, Aug 01 '11 at 13:50
Interesting: "Too bad MERGE doesn't support the RETURNING CLAUSE" http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5318183934935#11942022601143 — Maxime Pacary, Aug 01 '11 at 15:54

score 1 · Accepted Answer · answered Aug 01 '11 at 20:25

1

Add a MODIFICATION_DATE column to the table

Grab and save the sysdate.

When you merge update/insert the value of the sysdate as well.

When the merge is complete, select the rows where the MODIFICATION_DATE = SYSDATE and you have the set you are interested in.

answered Aug 01 '11 at 20:25

EvilTeach

28,120
21
85
141

Clever idea, but I think it could be improved a little bit, for example if two sets of merges are made exactly at the same time, I will end up with too many keys (from both merges). Maybe some kind of random hash (integer) should do the trick. – Maxime Pacary Aug 01 '11 at 20:31
Mmmh not perfect however... I may have similar problems than with using sysdate: one merge affecting some rows; commit; another merge affecting the same rows; commit; then if a select is made to retrieve the first merge primary keys, some keys won't be returned (rows updated by the second merge)... – Maxime Pacary Aug 01 '11 at 20:38
have each merge grab a next sequence number. each merge upserts with that value. select the rows where the sequence number matches.... It can get messy, if the merges update the same row :) – EvilTeach Aug 01 '11 at 20:39
another approach would be to delay the commit, until after you select out the keys you need. The other session won't see the data until you commit. – EvilTeach Aug 01 '11 at 20:41
I currently don't have enough time to test that... I'm currently trying to have my quick-and-dirty logic to work, and when it will be Ok I'll try to improve perf. I will keep you informed :) – Maxime Pacary Aug 02 '11 at 14:06

score 0 · Answer 2 · answered Aug 01 '11 at 13:57

0

Why can't you use a MERGE statement for this? This is exactly what a MERGE is for. Here is a rough idea of how it would look...

merge into mytable mt
using 
(
    select key_field, value_field from sourcetable
) st
on 
( mt.key_field = st.key_field )
when matched then update
    set mt.value_field = st.value_field
when not matched then insert
    ( key_field, value_field )
    values 
    ( st.key_field, st.value_field )
;

Using a MERGE statement is fast because it is a single statement and the Oracle optimizer can utilize indexes and choose a better explain path than iterating through a cursor using PL/SQL.

answered Aug 01 '11 at 13:57

Jordan Parmer

36,042
30
97
119

I said "Naturally, I should try to use MERGE but I need, as well, to retrieve all created (when inserting) and existing (when updating) primary keys.". MERGE doesn't seems to allow that. – Maxime Pacary Aug 01 '11 at 13:58
I'm not sure I understand. You might need to give a more complete example in the question. In a MERGE, if the key doesn't already exist in your source, it will create it using the INSERT statement. Otherwise, if it finds a matching key, it will use the UPDATE statement. I think the key to your problem is to define the source query appropriately. – Jordan Parmer Aug 01 '11 at 14:02
I guess I should be asking, what is the source of your keys? Is it a different table or based on logic in a procedure (or something else)? – Jordan Parmer Aug 01 '11 at 14:04
The source of the keys is a sequence (when inserting), or the primary key value itself, when the row with that PK already exists. As you can see after inserting I do "SELECT SEQ_MYTABLE.CURRVAL FROM DUAL" – Maxime Pacary Aug 01 '11 at 14:17
Okay. What I don't understand is how do you know when to increment your sequence? What triggers the need to do this? – Jordan Parmer Aug 01 '11 at 14:20
If the row already exists, the sequence does not need to be incremented. If a new row has to be inserted, the sequence needs to be incremented. But even if I increment the sequence in both cases, I need to retrieve the existing or inserted primary key values. – Maxime Pacary Aug 01 '11 at 15:34
1

Are the new rows coming from another table or an application? If it is coming from an application, you won't be able to use a MERGE (unless you cache the new inserts into a global temp table and then merge from that - which would be fast). Otherwise, you'll have to go with a stored procedure approach. – Jordan Parmer Aug 01 '11 at 16:42
New rows come from an application. Thanks for suggesting the global temp table. – Maxime Pacary Aug 01 '11 at 20:42

score 0 · Answer 3 · answered Aug 01 '11 at 15:30

If the keys are being generated from a sequence, then the normal way to get the key generated by that insert is to use the returning clause:

declare
  v_insert_seq integer;
begin
  insert into t1 (pk, c1)
  values (myseq.nextval, 'value') returning pk into v_insert_seq;
end;
/

However, as best as I can tell, the merge statement doesn't support that returning feature.

Depending on the source of your new rows, there are different ways you could do this. If you are inserting one row at a time, then the approach above will work pretty well.

To detect the duplicate records, just catch the exceptions when you are inserting (when dup_val_on_index) and then handle them with updates.

If your source of rows is another table, you probably want to look at bulk inserts, and allowing Oracle to return you an array of new PK values. I tried this, but couldn't get it working, so perhaps it's not supported (or I'm missing something today - it gives a syntax error):

declare
  type t_type is table of t1.pk%type;
  v_insert_seqs t_type;
begin

   insert into t1 (pk, c1)
   select level newpk, 'value' c1value
   from dual
   connect by level <= 10 returning pk bulk collect into v_insert_seqs;

exception 
  when dup_val_on_index then
    raise;
end;
/

The next best thing is to select the rows into arrays and then use bulk binds with the returning clause to capture the new PK IDs and also use Save Exceptions to catch all the rows that failed to inserted. Then you can process any of the failed inserted afterwards:

set serveroutput on
declare
  type t_pk is table of t1.pk%type;
  type t_c1 is table of t1.c1%type;
  v_pks t_pk;
  v_c1s  t_c1;
  v_new_pks t_pk;

  ex_dml_errors EXCEPTION;
  PRAGMA EXCEPTION_INIT(ex_dml_errors, -24381);

begin
  -- get the batch of rows you want to insert
  select level newpk, 'value' c1
  bulk collect into v_pks, v_c1s
  from dual connect by level <= 10;

  -- bulk bind insert, saving exceptions and capturing the newly inserted
  -- records
  forall i in v_pks.first .. v_pks.last save exceptions 
    insert into t1 (pk, c1)
    values (v_pks(i), v_c1s(i)) returning pk bulk collect into v_new_pks;

exception
 -- Process the exceptions 
  when ex_dml_errors then
    for i in 1..SQL%BULK_EXCEPTIONS.count loop
      DBMS_OUTPUT.put_line('Error: ' || i || 
          ' Array Index: ' || SQL%BULK_EXCEPTIONS(i).error_index ||
          ' Message: ' || SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE));
    end loop;
end;
/

Thanks. Interesting, but it seems that when processing exceptions, you cannot retrieve the PKs of the existing rows. — Maxime Pacary, Aug 01 '11 at 15:53
If you are using a sequence to generate the PKs, then you will never have a duplicate PK anyway. If you have another unique constraint violated based on the non-generated columns in the table / new row, then the error message should tell you which constraint was violated and hence you know the key as it is in your failed row already. — Stephen ODonnell, Aug 02 '11 at 09:10

EvilTeach · Answer 4 · 2011-08-01T20:51:02.823

0

If you are running Oracle 10 or better, you might be able to do much the same thing, for nearly free by issuing a commit before the merge to update the SCN, then after the merge, use the ORA_ROWSCN to detect which rows have changed.

edited Aug 01 '11 at 20:51

answered Aug 01 '11 at 20:31

EvilTeach

28,120
21
85
141

1

ORA_ROWSCN doesn't seem to be 100% reliable, as stated here: http://psoug.org/reference/pseudocols.html "It is not absolutely precise, because Oracle tracks SCNs by transaction committed for the block in which the row resides." thanks for the suggestion anyway. – Maxime Pacary Aug 01 '11 at 20:50

How to merge rows + retrieve new and existing keys

4 Answers4