2

We are using Postgres as part of our backend structure in Nodejs (Using pg). This is a very high multi process environment with a bunch of microservices, where the services query the same table. There is basically a column which functions as a lock 'status' - which value is either 'pending' for unlocked, or 'in-process' for locked.

There are two queries which select data from the table and lock the corresponding rows:

   UPDATE market AS m
   SET status='in_process', status_update_timestamp='${timestamp}'
   WHERE m.guid IN 
          (SELECT guid FROM market
           WHERE status = 'pending' 
           ORDER BY created_at
           LIMIT 1 FOR UPDATE)
   RETURNING *
   UPDATE market AS m
   SET status = 'in_process', status_update_timestamp = '${timestamp}'
   WHERE m.guid IN 
          (SELECT guid FROM market
           WHERE request_status='pending' AND asset_id IN (${myArray.join(",")}) 
           FOR UPDATE)
   RETURNING *

And one queries which unlocks rows based on guids:

   UPDATE market
   SET status='pending', status_update_timestamp='${timestamp}'
   WHERE guid IN ('${guids.join("','")}')

There are cases where the two selecting queries can block each other, and also cases where the unlocking query and one other selecting query block eachother.

All of these queries can be executed in parallel from multiple services, and even though they are supposed to be atomic according to the documentation (link), we still get an error from postgres that 'deadlock is detected'. We tried wrapping the queries with BEGIN and END, different isolation levels, and different ORDER BYs but still without any improvement.

Is there any problem in the queries that give rise to deadlocks? Is this a problem that have to be solved in the application logic? Any help is welcome.

Table structure:

CREATE TABLE market
(
    id                      BIGSERIAL not null constraint market_pkey primary key,
    guid                    UUID      DEFAULT uuid_generate_v4(),
    asset_id                BIGINT,
    created_at              TIMESTAMP DEFAULT current_timestamp,
    status_update_timestamp TIMESTAMP DEFAULT current_timestamp,
    status                  VARCHAR   DEFAULT 'pending'
);
  • See https://stackoverflow.com/questions/11532550/atomic-update-select-in-postgres - in particular, the most common way I've seen this solved is by using transactions with SELECT ... FOR UPDATE. If you're in a high-contention environment and don't care much which row you get, see SKIP LOCKED. Also, I should mention that this quote reveals the root cause of the problem: "a bunch of microservices, where the services query the same table". "Microservices" is a shared-nothing architecture; you may be working with a distributed monolith. – Robert Kawecki Mar 31 '21 at 09:58
  • We are using SELECT ... FOR UPDATE as mentioned in the question, and skipping rows does not solve our problem. What I meant by bunch of microservices is that all these queries are performed from the same service, but there are multiple instances of this service running in parallel. – Adar Slonim Mar 31 '21 at 10:09
  • You should not be getting deadlocks if all your transactions are in the same order: BEGIN, then SELECT ... FOR UPDATE, then the actual UPDATE and COMMIT. Perhaps there are other queries running in parallel that apply a different ordering of operations? – Robert Kawecki Mar 31 '21 at 10:11
  • The mentioned queries are the only queries that are accessing the table. – Adar Slonim Mar 31 '21 at 10:19

1 Answers1

0

"Atomic" doesn't mean "can't fail". It just means that if it does fail, the whole thing gets rolled back completely.

You could solve the problem in the app by catching the deadlock errors and retrying them.

Perhaps you could redesign your transactions to be less prone to deadlock, but without knowing the rationale behind each query it is hard to suggest how you would go about doing that.

jjanes
  • 37,812
  • 5
  • 27
  • 34