Managing concurrency for data insertion in DB2

Question

I am currently doing a research and would like to know what methods or algorithms are currently present to manage multi-user data insertion into two relational table (Related through Primary Key and Foreign Key).

In the primary key table,
- Primary key can either be auto-increment id or non auto-increment id.

In the foreign key table
- Every time a new record is inserted into primary key table, 2 records have to be inserted into foreign key and pointing to the primary keys of new record

Currently i am doing it like this.
1. I have 2 table (Primary Key Table with Non-Increment) (Foreign key table)
2. I have another table (counter) storing the largest primary key value
3. Every time a user insert a new record, it will read the largest primary key value + 1
4. It will then begin transaction and use commit statement to insert three records into the primary key table and foreign key table respectively

My concern is what are the alternatives available in the market?

Why you do not use auto increment? I did not understood the question for alternatives - alternative DB other than DB2? — i486, Nov 20 '14 at 09:19
see http://stackoverflow.com/questions/282099/whats-the-hi-lo-algorithm — Nathan Hughes, Nov 20 '14 at 18:19
The reason because I am not using auto increment is because of this scenario. 1. I insert a record in the primary key table and the record is at the ID 50 (auto increment) 2. When I want to insert two record into the foreign key table, I have to get back the Primary Key maybe using MAX(ID). 3. However, if between the gap of inserting record and retrieving the max(ID), if another user inserts another record, the result will return 51 in stead of 50. 4. This cause incorrect records in the foreign key table. — Ping, Nov 21 '14 at 01:29

score 1 · Accepted Answer · answered Nov 20 '14 at 17:10

If I understand your question correctly, you are concerned about determining the just inserted parent record identity value that you need to use in the child records. There are several ways of accomplishing this (I'm assuming here we are talking about DB2 for Linux, Unix, and Windows, since you did not indicate your DB2 version and platform).

If you define the parent PK column with an IDENTITY attribute, you can reference the just generated identity value using the built-in function IDENTITY_VAL_LOCAL().

If you use a SEQUENCE to generate key values, you would use the reference NEW VALUE FOR myseq when inserting into the parent table and PREVIOUS VALUE FOR myseq for the child table.

Finally, you can use the feature called "data change table reference", which allows you to obtain the results of a DML statement:

SELECT yourpk_column FROM FINAL TABLE ( INSERT INTO yourparent_tab... )

You can even insert into both parent and child tables with a single statement (in this example person.person_id is defined as IDENTITY):

WITH parent (person_id) AS (  
  SELECT person_id FROM FINAL TABLE (
    INSERT INTO person (first_name, last_name) VALUES ('John', 'Doe')   
  )
)
SELECT * FROM NEW TABLE (
  INSERT INTO person_phone (person_id, type, number)
     SELECT person_id, 'Office', '555-555-1234' FROM parent
    UNION ALL
     SELECT person_id, 'Home', '555-555-6789' FROM parent)

Sorry for not specifying my database environment. I am using IBM DB2 for VB Application. I am interested in the third way using "data change table reference". If the third statement fails whereby INSERT INTO New Table "HOME", "555-555-6789" fails, will the first insert statement into Final Table still take effect? — Ping, Nov 21 '14 at 01:40
SQL statements are atomic, so if anything fails within a statement none of the changes made by that statement would persist. — mustaccio, Nov 21 '14 at 02:59
Mustaccio, thanks for your reply, but now I have another concern. If I have a function which past three insert statement as Parameter. How do I convert these three statement into "data change table reference" Format Statement. Given First statement is always the parent statement — Ping, Nov 21 '14 at 03:07
IDENTITY_VAL_LOCAL() if I were to use it in a transaction. Will it only get the commited insert statement? — Ping, Nov 21 '14 at 03:12
`IDENTITY_VAL_LOCAL()`, as the identity generation itself, does not depend on transactions. — mustaccio, Nov 21 '14 at 19:16

score 1 · Answer 2 · answered Nov 21 '14 at 17:56

As others have said, you can use IDENTITY provided by the DBMS.

However, if you wish to use your own counter, but also plan for a concurrent process, then you can take two approaches: lock the counter that you are updating, or double-check that no other process updated it and re-try if it did.

To lock, you can select ... for UPDATE (select while setting concurrency-mode as UPDATE). This will allow you to read the value from a cursor, update the current row with the record being locked between your read and your update. You will want to release the lock immediately after the update; so, if the INSERTs fail for some reason, you will end up with some sequence numbers that are never used (which is often not a problem).

The other way it to generate a GUID type of value (though a TIMESTAMP may be good enough) and update that as you are incrementing the sequence number. When you fetch, you can then double-check that the GUID has not changed. If it has, it means another process did an update. So, you have to retry.

Hopefully this demonstrates why it might be easier to leave the hard work to the DBMS. OTOH, it is not as complex as it seems and there are practical advantages to using your own counter.

It is the first time i heard about GUID. May i know if i want to generate another unique identifier like GUID. What are the components you suggest. Curretly components can think of is timestamp and IP address and userID — Ping, Nov 23 '14 at 02:23
For our information, i do not want to use GUID because i want my end result format to be long = 19 digits — Ping, Nov 23 '14 at 02:24

score 0 · Answer 3 · answered Nov 20 '14 at 18:12

0

Using a "counter" table to keep track of one or more IDs like you describe is a design pattern that often results in very high contention on the row(s) in the counter table and a massive concurrency headache, even with only a moderate rate of new key generation.

DBMSs solved this problem a long time ago with IDENTITY columns (aka "auto-increment") and/or SEQUENCES.

Even with an identity or sequence it is possible to run into concurrency issues at very, very high insert rates when you have an index on the column storing these values. Identities and sequences are typically monotonic, so you can get into a situation where write contention on the "highest" index page for the column storing these values can cause concurrency issues.

answered Nov 20 '14 at 18:12

Ian Bjorhovde

10,916
1
28
25

The reason because I am not using auto increment is because of this scenario. 1. I insert a record in the primary key table and the record is at the ID 50 (auto increment) 2. When I want to insert two record into the foreign key table, I have to get back the Primary Key maybe using MAX(ID). 3. However, if between the gap of inserting record and retrieving the max(ID), if another user inserts another record, the result will return 51 in stead of 50. 4. This cause incorrect records in the foreign key table. How do you solve this issue? – Ping Nov 21 '14 at 01:34
What you're describing is not an issue with using auto increment; it's an issue in your logic (i.e., assuming that `MAX(ID)` corresponds to the row you just inserted). @mustaccio's solution of using `FINAL TABLE` eliminates the problem. – Ian Bjorhovde Nov 25 '14 at 17:36

Managing concurrency for data insertion in DB2

3 Answers3