7

When using a database transaction to group multiple updates, should I include SELECTs inside the transaction as well? For instance, lets say I:

  1. get a record
  2. check edit permissions for that record, using data from the record
  3. update some records
  4. update some other records

Should I start the transaction before the "get a record" stage, or just around the updates?

I'm using Postgres/Django transaction.atomic() but I don't think it matters here.

Scott Stafford
  • 43,764
  • 28
  • 129
  • 177

2 Answers2

6

The short version: "It depends".

The long version:

If you're doing a read-modify-write cycle, then not only must it be in a transaction, but you must SELECT ... FOR UPDATE any records you later intend to modify. Otherwise you're going to risk lost writes, where you overwrite an update someone else made between when you read the record and when you wrote the update.

SERIALIZABLE transaction isolation can also help with this.

You really need to understand concurrency and isolation. Unfortunately the only simple, easy "just do X" answer without understanding it is to begin every transaction by locking all the tables involved. Most people don't want to do that.

I suggest a read (or two, or three, or four - it's hard material) of the tx isolation docs. Experiment with concurrent psql sessions (multiple terminals) to create race conditions and conflicts.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    A note for Django users: `SELECT ... FOR UPDATE` is available through Django's `select_for_update` (https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.select_for_update) – Scott Stafford Jul 02 '14 at 14:32
  • 1
    @ScottStafford ... but not unfortunately `SELECT ... FOR SHARE`, or PostgreSQL's `SELECT ... FOR KEY SHARE` / `FOR KEY UPDATE`. – Craig Ringer Jul 02 '14 at 14:35
  • So, to make sure I follow with the Django angle, a call like `MyObject.objects.get(pk=5)` will not lock anything whether inside a `transaction.atomic()` or not. To do that I need to be both inside a `transaction.atomic()` AND use a form like `MyObject.objects.select_for_update().get(pk=5)`, to make sure that MyObject 5 cannot be changed until after my transaction ends. – Scott Stafford Jul 02 '14 at 18:20
1

Ideally (if possible) you would do all of your four steps in a single data-modifying CTE (which automatically happens inside a single transaction).

That still does not rule out race conditions, just makes them very unlikely, because the time frame between SELECT .. FOR UPDATE and a later UPDATE is minimized. (Yes, you still should use FOR UPDATE (or another appropriate locking level) to counter race conditions under heavy concurrent access.)

This is not the typical (inefficient) approach of a web-framework like Django. But it's the superior approach. It optimizes performance in a number of ways:

  • Fewer round trips to the db server (probably most important)
  • Minimize lock times
  • Allow Postgres to optimize queries

When using SELECT .. FOR UPDATE in a data-modifying CTE, be aware that unreferenced CTEs are not executed at all, which would also not lock rows as intended.

Code examples for data-modifying CTEs:

There are many more on SO. Try a seach.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Very interesting. In my case, I can afford multiple roundtrips to the DB in exchange for code maintainability and keeping authorization logic in one place in code. I'm looking for transactional correctness -despite- the games the ORM plays, here. Also, CTEs frighten me. – Scott Stafford Jul 02 '14 at 18:17
  • @ScottStafford: SQL or PL/pgSQL server-side functions may be less frightening alternatives with similar benefits ... – Erwin Brandstetter Jul 02 '14 at 18:25