17

I have been reading thoroughly the postgres documentation on transaction isolation suggested in other of my questions but I have not still managed to understand the "predicate locking" stuff.

I hope somebody can enlighten me :-)

According to the documentation: Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction

That sounds good, then why is the following happening?

CREATE TABLE mycustomer(cid integer PRIMARY KEY, licenses integer);
CREATE TABLE mydevice(id integer PRIMARY KEY, cid integer REFERENCES 
mycustomer (cid), status varchar(10));

INSERT INTO mycustomer(cid, licenses) VALUES (1, 5);
INSERT INTO mycustomer(cid, licenses) VALUES (2, 5);

    Request 1                            Request2
BEGIN TRANSACTION ISOLATION 
LEVEL SERIALIZABLE;
                                         BEGIN TRANSACTION ISOLATION 
                                         LEVEL SERIALIZABLE;
SELECT * from mydevice where cid = 1;

                                         SELECT * from mydevice where cid = 2;
INSERT INTO mydevice(id, cid, status) 
VALUES (1, 1, 'ok');

                                         INSERT INTO mydevice(id, cid, status)         
                                         VALUES (2, 2, 'ok');
commit;
(=ok)                                 
                                         commit;
                                         (=rollback)

I understand that the inserts from request 1 and request 2 are not conflicting with the previous reads and thus there should not be any error launched. Why am I getting a "ERROR: could not serialize access due to read/write dependencies among transactions".

As you can imagine I cannot have the aforementioned behavior happening since every concurrent request would be roll-backed regardless of its details. In my business scenario I would like concurrent requests to be only roll-backed when they were inserting data (as per the example devices) for the same single customer.

These operations are performed from a Java application and In principle I am thinking about creating a locking table to satisfy my needs. Any ideas?

Many thanks!

Community
  • 1
  • 1
  • Thank you for your follow-up Craig. As you suggested I updated to PostgreSQL 9.2.1 (it is on the subject) and it is now when I am having this "issue" – Javier Moreno Garcia Oct 11 '12 at 12:12
  • Oh, duh; I totally missed the title. My bad. – Craig Ringer Oct 11 '12 at 12:13
  • I'm surprised that these two conflict. I'll ask about it on the Pg mailing list. I wonder if it's a matter of lock granularity - say, if it's locking on a page level not a tuple level. In this case you could achieve the desired effect with non-`SERIALIZABLE` transactions and a `SELECT 1 FROM mycustomer WHERE cid = 1 FOR UPDATE` at the start of each tx to prevent concurrent updates to records owned by the same customer. You can add it to a trigger `BEFORE INSERT OR UPDATE OR DELETE` on `mydevice` but you'll get deadlock aborts sometimes, so it's better to do it direct from the client. – Craig Ringer Oct 11 '12 at 12:25

2 Answers2

21

From the Transaction Isolation page:

The particular locks acquired during execution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple locks) may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to prevent exhaustion of the memory used to track the locks.

...

  • A sequential scan will always necessitate a relation-level predicate lock. This can result in an increased rate of serialization failures.

An EXPLAIN on that SELECT can tel you what the query plan is being taken, but if the table is small (or empty!), PostgreSQL will almost certainly pick a sequential scan instead of referencing the index. This will cause a predicate lock on the entire table, causing serialization failure whenever another transaction does anything to the table.

On my system:

isolation=# EXPLAIN SELECT * from mydevice where cid = 1;
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on mydevice  (cost=0.00..23.38 rows=5 width=46)
   Filter: (cid = 1)
(2 rows)

You could try adding an index and force it to use that:

isolation=# CREATE INDEX mydevice_cid_key ON mydevice (cid);
CREATE INDEX
isolation=# SET enable_seqscan = off;
SET
isolation=# EXPLAIN SELECT * from mydevice where cid = 1;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Index Scan using mydevice_cid_key on mydevice  (cost=0.00..8.27 rows=1 width=46)
   Index Cond: (cid = 1)
(2 rows)

However, this is not the correct solution. Let's back up a little bit.

Serializable is meant to guarantee that transactions will have exactly the same effect as if they were run one after another, despite the fact that you're actually running these transactions concurrently. PostgreSQL does not have infinite resources, so while it's true that it puts predicate locks on data that your query actually accesses, "data" can mean more than "rows returned".

PostgreSQL chooses to flag serialization failures when it thinks there might be a problem, not when it's certain. (Hence how it generalizes row locks to page locks.) This design choice cause false positives, such as the one in your example. False positives are less than ideal, however, it doesn't affect correctness of the isolation semantics.

The error message is:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.

That hint is key. Your application needs to catch serialization failures and retry the whole operation. This is true whenever SERIALIZABLE is in play -- it guarantees serial correctness despite concurrency, but it can't do that without the help of your application. Put another way, if you're actually doing concurrent modifications, the only way PostgreSQL can satisfy the isolation requirements is to ask your application to serialize itself. Thus:

It is important that an environment which uses this technique have a generalized way of handling serialization failures (which always return with a SQLSTATE value of '40001'), because it will be very hard to predict exactly which transactions might contribute to the read/write dependencies and need to be rolled back to prevent serialization anomalies.

willglynn
  • 11,210
  • 48
  • 40
  • Nicely explained. I forgot about the effects of seqscan with the new 9.1 serializability code. – Craig Ringer Oct 11 '12 at 23:30
  • 1
    @JavierMoreno BTW, the need to save the whole transaction's state until the transaction commits succesfully - and be able to retry it if it fails - is not unique to `SERIALIZABLE`. You should almost always keep all the info needed to re-issue a transaction until you get a successful commit, because things like an admin restarting the server, a backend crash (rare but not impossible), a backend being terminated by an admin, disk-full, I/O errors, network errors, etc can still cause your transaction to fail in ways that may be recoverable by re-issuing the transaction. – Craig Ringer Oct 11 '12 at 23:32
  • Your explanations are really helpful to understand how things work. At the end I have decided to use a locking table with a isolation READ COMMITED :-) Thank you guys. – Javier Moreno Garcia Oct 12 '12 at 06:53
  • So is this caused by the implementation of serialization in PostgreSQL? Is this something that will or could be changed to get the logical behavior as the question poster was expecting? – Kuberchaun Oct 12 '12 at 12:00
  • In principle, PostgreSQL could hold on to predicate locks encapsulating the `WHERE` condition of every query, and then evaluate all DML activity against all such locks. (Well, at least for the set of queries containing only non-volatile conditions.) In practice, I don't think that'll ever happen because it has significant performance implications. Even then, you still need to guard against serialization failures if you want to use `SERIALIZABLE`; that would just eliminate the false positive in the example. – willglynn Oct 12 '12 at 13:41
  • FYI the "You could try adding an index and force it to use that:" does avoid the error. But like stated this is not the correct solution. – Kuberchaun Oct 12 '12 at 14:56
1

For those more curious,in Postgres 9.1 Source code, if you look at src/backend/storage/lmgr/README-SSI, there is lot of detailed description about Predicate locking and Serializable transactions

Here's a snippet of the same:

Serializable Snapshot Isolation (SSI) and Predicate Locking ===========================================================

This code is in the lmgr directory because about 90% of it is an implementation of predicate locking, which is required for SSI, rather than being directly related to SSI itself. When another use for predicate locking justifies the effort to tease these two things apart, this README file should probably be split.

Credits:

This feature was developed by Kevin Grittner and Dan R. K. Ports, with review and suggestions from Joe Conway, Heikki Linnakangas, and Jeff Davis. It is based on work published in these papers:

 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008.
 Serializable isolation for snapshot databases.
 In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD
 international conference on Management of data,
 pages 729-738, New York, NY, USA. ACM.
 http://doi.acm.org/10.1145/1376616.1376690

 Michael James Cahill. 2009.
 Serializable Isolation for Snapshot Databases.
 Sydney Digital Theses.
 University of Sydney, School of Information Technologies.
 http://hdl.handle.net/2123/5353
Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
Arvind
  • 466
  • 3
  • 9
  • 1
    I'm too lazy to look there - if you're familiar with it, do you know at the top of your head whether postgres does something more clever than range locks on a single index if possible, and table locks otherwise? I think that that is what sql server does and I'm wondering if postgres does more. – John Dec 09 '13 at 17:05