2

Is anyone using Rails 4.x or Rails 5.x and the Postgres (9.4+) jsonb column? I understand the benefits of using jsonb + Postgres - you get to mix schemaless data with structured data. Some might call this a happy medium.

We've noticed it's easy to (accidentally or intentionally) overwrite jsonb data when updating a row. Has anyone seen any techniques to make the jsonb column append-only?

Suppose a row has ajsonbcolumn, let's call it "questionnaire". The questionnaire column has the following lifecycle:

  1. starts as null
  2. some json is saved (via UPDATE operation)
  3. another UPDATE operation occurs to append more json this time it's a json array
  4. another UPDATE operation occurs, but this time with "{}" which is valid json
  5. all the data stored in "questionnaire" is now invalid.

This is bad because we have essentially lost all data.

More broadly, how have others implemented append only tables (or columns) in Rails or Postgres? Is this problem best tackled at the db layer or application layer?

Appreciate the thoughts. Thanks in advance!

Henry
  • 926
  • 2
  • 12
  • 27

1 Answers1

2

As you noticed, json[b] values (just like any other type in PostgreSQL) can only be UPDATEed as a whole.

8.14.2. Designing JSON documents effectively:

JSON data is subject to the same concurrency-control considerations as any other data type when stored in a table. Although storing large documents is practicable, keep in mind that any update acquires a row-level lock on the whole row. Consider limiting JSON documents to a manageable size in order to decrease lock contention among updating transactions. Ideally, JSON documents should each represent an atomic datum that business rules dictate cannot reasonably be further subdivided into smaller datums that could be modified independently.

So, one obvious solution for you is to divide your JSON array & store its elements instead (f.ex. in a junction table, with one-to-many relation to your original table).

However, you could avoid these "lost updates" in a few other ways too (but these are really not those ideal ways).

  1. Atomic UPDATEs

Let me introduce you to an analogy. If you want to make a counter in any RDBMS, you usually do this:

UPDATE counter SET value = value + 1

And this is (of course) not subject to lost updates. However, when you do

SELECT value FROM counter
-- do something in client & bind the selected value + 1 to the next query:
UPDATE counter SET value = ?

It is subject to lost updates. Because, between the SELECT & UPDATE statement, another transaction could update the value before the current one. If that happened, those UPDATEs are lost. You most probably do this kind of UPDATE with your jsonb column.

The first statement's jsonb counterpart could look like one of these:

-- to append a JSON array element to the root JSON array
UPDATE t SET jsonb_col = jsonb_col || '[{"a":1}]'; 
-- to append a JSON array element to an array, located on the path: 'a' (requires 9.6+)
UPDATE t SET jsonb_col = jsonb_insert(jsonb_col, ARRAY['a', '-1'], '{"a":1}', TRUE);
-- Notes: TRUE means that insert AFTER ... -1 (in the path) means after the LAST ELEMENT

However, these are (usually) hard to achieve with ORMs.

  1. Locking

If you can't use the queries above, then you must ensure that only one transaction can UPDATE a row in your table at a time.

2/A. Pessimistic locking

This way, you tell the RDBMS explicitly that you SELECTed a row for a specific reason: FOR UPDATE. F.ex. ActiveRecord supports this.

2/B. Optimistic locking

With this, you must use/include a version column in your UPDATE, i.e.:

UPDATE t
SET    jsonb_col = ?,
       t_version = t_version + 1
WHERE  t_version = ?

This way, there is no way to loose an UPDATE, but your statements might not do anything at all. You must check the row-count yourself (in your client) & re-try if it didn't update any rows.

F.ex. ActiveRecord supports this too.

Read more on these: Optimistic vs. Pessimistic locking

  1. Serializable transactions

Serializable transactions works like an Optimistic locking-based solution, except it doesn't require a special, version column. Instead, the RDBMS will use predicate-locking to avoid lost updates. Also, you are expected to re-try entire transactions, when serialization failure occurs.

Community
  • 1
  • 1
pozs
  • 34,608
  • 5
  • 57
  • 63