10

I'm looking for a way to manage optimistic concurrency control across more than one table in Postgres. I'm also trying to keep business logic out of the database. I have a table setup something like this:

CREATE TABLE master
(
    id SERIAL PRIMARY KEY NOT NULL,
    status VARCHAR NOT NULL,
    some_value INT NOT NULL,
    row_version INT NOT NULL DEFAULT(1)
)

CREATE TABLE detail
(
    id SERIAL PRIMARY KEY NOT NULL,
    master_id INT NOT NULL REFERENCES master ON DELETE CASCADE ON UPDATE CASCADE,
    some_data VARCHAR NOT NULL
)

master.row_version is automatically incremented by a trigger whenever the row is updated.

The client application does the following:

  1. Reads a record from the master table.
  2. Calculates some business logic based on the values of the record, this may include a delay of several minutes involving user interaction.
  3. Inserts a record into the detail table based on logic in step 2.

I want step 3 to be rejected if the value of master.row_version has changed since the record was read at step 1. Optimistic concurrency control seems to me like the right answer (the only answer?), but I'm not sure how to manage it across two tables like this.

I'm thinking a function in Postgres with a row-level lock on the relevant record in the master table is probably the way to go. But I'm not sure if this is my best/only option, or what that would look like (I'm a bit green on Postgres syntax).

I'm using Npgsql, given that the client application is written in C#. I don't know if there's anything in it which can help me? I'd like to avoid a function if possible, but I'm struggling to find a way to do this with straight-up SQL, and anonymous code blocks (at least in Npgsql) don't support the I/O operations I'd need.

Snixtor
  • 4,239
  • 2
  • 31
  • 54
  • You can also lock the record in the master table for "younger" transactions until detail is updated. That way you know that no one else can modify the data while you do your calculations. The bigger question is what you consider right. – SMW Jun 14 '16 at 04:54
  • Regarding "younger" transactions, would you be referring to to [`SERIALIZABLE` isolation level](https://www.postgresql.org/docs/current/static/transaction-iso.html)? – Snixtor Jun 14 '16 at 05:13
  • Maybe you could check COUNT(detail.id) where master_id = $1 and then compare with the same check again while inserting using the technique Laurenz showed, and if it has changed try inserting NULL to cause an exception. – heyhugo Apr 12 '18 at 07:04

2 Answers2

10

Locking is out if you want to use optimistic concurrency control, see the Wikipedia article on the topic:

OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources.

You could use a more complicated INSERT statement. If $1 is the original row_version and $2 and $3 are master_id and some_data to be inserted in detail, run

WITH m(id) AS
     (SELECT CASE WHEN master.row_version = $1
                  THEN $2
                  ELSE NULL
             END
      FROM master
      WHERE master.id = $2)
INSERT INTO detail (master_id, some_data)
   SELECT m.id, $3 FROM m

If row_version has changed, this will try to insert NULL as detail.id, which will cause an
ERROR: null value in column "id" violates not-null constraint
that you can translate into a more meaningful error message.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • 1
    Why is locking out? I can appreciate it might be a bit "belts and braces", in that optimistic concurrency control usually does away with the need for locking, but I didn't think there was any technical reason that they were mutually exclusive techniques. – Snixtor Jun 14 '16 at 23:48
  • Optimistic concurrency control is _defined_ as using no locks, compare the [Wikipedia article](https://en.wikipedia.org/wiki/Optimistic_concurrency_control). So what you're asking for is a contradiction in terms. The whole idea of optimistic concurrency control is that you _hope_ that nobody will change the data and you don't have to make sure by using a lock. – Laurenz Albe Jun 17 '16 at 08:58
  • I have quoted the Wikipedia article on the topic to back up my assertion (please remember that a read operation takes no row locks in PostgreSQL). You are of course free to disagree with what is common knowledge. If you feel that pessimistic concurrency control is the way to go for you, do it by all means. But if you use row locks, there is no need to make sure that the row has not been changed since you read it, because the lock will prevent that anyway. – Laurenz Albe Jun 20 '16 at 06:36
  • "if you use row locks, there is no need to make sure that the row has not been changed since you read it" - I don't think you've understood the scenario I've described. There definitely would be a need to make sure the row hasn't changed, per point 2 in the sequence of events I originally described. – Snixtor Jun 20 '16 at 06:58
  • Row locks, as any other locks, only hold until the end of the transaction. If you wanted to use pessimistic concurrency control, you'd `SELECT row_version FROM master WHERE ... FOR SHARE/UPDATE` and then you'd have to keep the transaction open until your step 2 is completed (which would be a bad idea for reasons outsode the scope of this comment). Nobody else can change the row in `master` during that time, no need to check. You can either use this so-called pessimistic CC or optimistic CC as I described in my answer; mixing them would not bring any benefits. – Laurenz Albe Jun 20 '16 at 07:43
10

I've since come to the conclusion that a row lock can be employed in a "typical" pessimistic concurrency control approach, but when combined with a row version can produce a "hybrid" approach with some meaningful benefits.

Unsurprisingly, the choice of pessimistic, optimistic or "hybrid" concurrency control depends on the needs of the application.

Pessimistic Concurrency Control

A typical pessimistic concurrency control approach might look like this.

  1. Begin database transaction.
  2. Read (and lock) record from master table.
  3. Perform business logic.
  4. Insert a record into detail table.
  5. Commit database transaction.

If the business logic at step 3 is long-running, this approach may be undesirable as it leads to a long-running transaction (generally unfavourable), and a long-running lock on the record in master which may be otherwise problematic for concurrency.

Optimistic Concurrency Control

An approach using only optimistic concurrency control might look more like this.

  1. Read record (including row version) from master table.
  2. Perform business logic.
  3. Begin database transaction.
  4. Increment row version on record in master table (an optimistic concurrency control check).
  5. Insert a record into detail table.
  6. Commit database transaction.

In this scenario, the database transaction is held for a shorter period of time, as are any (implicit) row locks. But, the increment of row version on the record in the master table may be a bit misleading to concurrent operations. Imagine several concurrent operations of this scenario, they'll start failing on the optimistic concurrency check because the row version has been incremented, even though the meaningful properties on the record haven't been changed.

Hybrid Concurrency Control

A "hybrid" approach uses both pessimistic locking and (sort of) optimistic locking, like this.

  1. Read record (including row version) from master table.
  2. Perform business logic.
  3. Begin database transaction.
  4. Re-read record from master table based on it's ID and row version (an optimistic concurrency control check of sorts) AND lock the row.
  5. Insert a record into detail table.
  6. Commit database transaction.

If step 4 fails to obtain a record, this should be considered an optimistic concurrency control check failure. The record has been changed since step 1 so the business logic is no longer valid.

Like the typical pessimistic concurrency control scenario, this involves a transaction and an explicit row lock, but the duration of the transaction+lock no longer includes the time necessary to perform the business logic.

Like the optimistic concurrency control scenario, the record requires a version. But where it differs is that the version is not updated, which means other operations depending on that row version won't be impacted.

Example of Hybrid Approach

An example of where the hybrid approach might be favourable:

A blog has a post table and comment table. Comments can be added to a post only if the post.comments_locked flag is false. The process for adding comments could use the hybrid approach, ensuring users can concurrently add comments without any concurrency exceptions.

The owner of the blog may edit their post, in which case the conventional optimistic concurrency control approach could be employed. The owner of the blog can have a long-running edit process which won't be affected by users adding comments. When the post is updated to the database, the version will be incremented, which means any in-progress comment-adding operations will fail, but they could be easily retried with a database-wins approach of re-fetching the post record from the database and retrying the process.

Snixtor
  • 4,239
  • 2
  • 31
  • 54
  • Is there a constraint between the records in the `master` and `detail` tables? If there is then you have no solution here, because ia subsequent transaction that updates only the record in the `master` table will leave the database inconsistent. On the other hand, if there is no constraint, then inserting a result in the `details` table based on a stale read is no problem, so in this case all this effort seems useless. Or am I missing something that is not clearly explained in your question or answer? – mljrg Jun 18 '18 at 13:55
  • This was actually helpful for me as I was trying to figure out a solution to this problem. In my view, it is the responsibility of the application to ensure that every change it makes is consistent. The case in this answer involves the application making a change to the `detail`table with dependency on master. Any change made to the `master` table by the application will have a similar check of the `detail` table as needed. – Dark Nebula Jul 23 '20 at 15:23