1

How should we make long-running transaction "integrity error" free while using Django? (especially foreignkey integrity error)

Our application has entity 'A' and 'B' (for the sake of simplicity) which has following semantics.

'B' has a foreign key field that points to 'A'. Periodically, multiple 'B' (in 100s of thousands) get created when a event take place. Creation of the multiple 'B's should be always transactional, in that we should always build all of the 'B's or nothing so that in case of error, we can re-trigger the event.
Intermittently, some 'A' may get destroyed. In other words, we cannot control the lifecycle of the 'A'.

I am currently using django's basic transaction functionality (transaction.atomic) to ensure 'B's atomic creational behaviour.

Since building 'B's take some time (1 hour or so), if some of the 'already-referenced' 'A' gets deleted in the meantime, whole transaction fails with the ForeignKey integrity error.

For the sake of clarity, if I may explain with crude diagram.

Following Bi's get created sequentially,

[ B1 (references A1), B2 (references A2) ....... Bn (references An) ] tries to transact the whole operation only when created all of the B1...Bn.

While creating Bm (1 <= m < n) Ak (1 <= k < m) gets destroyed and the results are committed to database, thus leading to raise ForeignKey integrity error, only when we built all of the Bs (after an hour), and fails.

If controlling the lifecycle of the 'A' were so easy, then this won't be a problem. If time for building multiple 'B's were so short, then this won't be a problem.

Since building 'B' take so much time, re-trying only after we find integrity error doesn't work well in the production environment.

In the best case scenario, It would be great if integrity error is suppressed and wrongful row gets silently deleted, the row which references already deleted 'A'.

I am currently using Django with Postgre. Any comment is greatly appreciated.


Another requirement is that locking table "A" (whether it is table-level or row-level) shouldn't be used since querying, modifying, and inserting (except for delete) operation should be made available regardless of whether I build "B" or not. "A" is a user information table, which should be open to alteration anytime (We can defer the deletion if needed)

Sihyun Joo
  • 47
  • 1
  • 6

2 Answers2

1

if some of the 'already-referenced' 'A' gets deleted in the meantime, whole transaction fails with the ForeignKey integrity error.

You need to prevent this from happening. The best way will be to take appropriate locks on the "A" objects before starting work.

I recommend that, in the transaction that builds the "B", you first SELECT 1 FROM "A" WHERE ... FOR SHARE, or use any Django-level abstraction that does the same underlying SELECT ... FOR SHARE of the rows to be referenced.

SELECT ... FOR UPDATE is also OK, it just takes a stronger lock you don't need.

Django probably calls this "explicit" or "pessimistic" locking. (Googles). Yup:

It doesn't seem to support FOR SHARE, but I guess that's the kind of limitation you have to live with when using an ORM. It'll only be a problem if muliple things need to prevent the same "A" object being deleted at the same time, because with FOR SHARE they can both proceed concurrently, wheras with FOR UPDATE they have to wait serially for each other.

--

After your comment clarification: it sounds like what you really want is a feature added in PostgreSQL 9.3, FOR KEY SHARE, which locks only the primary key of the referenced value, so you can still change the other columns. You'd certainly have to use native SQL to do that, of course, and it's a PostgreSQL specific extension to the SQL-standard FOR SHARE and FOR UPDATE.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks for the info. Like you said, if table level locking is applied, building "B" will always succeed. But the problem is that it also prevents "A" from being accessed and modified. For the record, "A" is a model that represents a user, so new user insertion and user info update should not be prevented by operation of building "B". It is my fault to not specify the exact requirement, my bad. By the way, thank you for your thoughtful comment. I had only known about "FOR UPDATE" clause, not "FOR SHARE" clause. I will make sure to accept your answer if no better answer is given! – Sihyun Joo Jul 02 '14 at 12:28
  • `FOR UPDATE` and `FOR SHARE` are row-level, not table-level, locking. Given the additional explanation supplied, it sounds like what you really want is a feature added in PostgreSQL 9.3, `FOR KEY SHARE`, which locks only the primary key of the referenced value, so you can still change the other columns. You'd certainly have to use native SQL to do that, of course. – Craig Ringer Jul 02 '14 at 13:06
  • Oops! Thank you for providing the right correction. Yes, you are right in that `FOR UPDATE` and `FOR SHARE` are row-level. However, since number of "B"s that gets created is quite large, row-level locking isn't sufficient to solve the issue. I will definitely look into the `FOR KEY SHARE`, never heard of that before (unfortunately I am using postgre 9.2.4 though) Thank you again – Sihyun Joo Jul 02 '14 at 17:42
  • Well, if you can't lock the records and can't externally life cycle manage them you have to retry if they vanish while you work. Which you said is a problem. Your other options are: to drop the foreign key relationship; to insert 'B' records individually conditional on their 'A' still existing (read committed isolation). – Craig Ringer Jul 02 '14 at 23:44
0
Since building 'B's take some time (1 hour or so), if some of the 'already-referenced' 'A' gets deleted in the meantime, whole transaction fails with the ForeignKey integrity error.

How about using pre_delete signal? You would need something to check if B's transaction are being done. If that's the case, raise an error in the signal, if not, just remove the A object.

PS: I don't know if this will work, it's just an idea that came to my mind thinking about the problem. Hope it will!

edit: Further explanation


    class B(models.Model):
        transaction_active = False

    def mylongtransactionview(request):
        models.B.transaction_active = True
        go_long_transaction()
        models.B.transaction_active = False

    def mypre_delete_A_signal():
        if models.B.transaction_active:
            raise Exception
        else:
            pass
    
argaen
  • 4,145
  • 24
  • 28
  • That does seem to make sense! Are there any simple mechanism that could work as a flag to mark the transaction's running, without incurring too much an overhead? Building separate table, or using other external data source for identifying lifecycle seems to be a bit shady. Are there any way to know whether there is any live running insertion transaction for specific table? Excellent suggestion! Thank you! – Sihyun Joo Jul 02 '14 at 17:46
  • Mmmm as first thought, maybe having an attribute for the class B itself so, when your view that triggers this long transaction, you can set this attribute to True (transaction being processed). I don't mean table attribute, I mean a class attribute, but I don't know if it's possible to create class attributes that are not synced with the database in Django... To be honest, I've never done this before (though it would be good to know if this works, I'll also try when I have time :D). – argaen Jul 02 '14 at 19:26
  • I've added some "pseudo"code to show how it would look. Don't know if there is a cleaner way to do this, but it seems pretty easy and clear to me to implement that this way. – argaen Jul 03 '14 at 13:42