What are 'best practices' for dealing with transient database errors in a highly concurrent Rails system?

Question

While researching a deadlock issue, I found the following post:

https://rails.lighthouseapp.com/projects/8994/tickets/6596

The gist of it is as follows:

the MySQL docs say:

Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
Therefore debugging transient deadlocks is an antipattern because MySQL says they are OK and unavoidable.
Therefore, Rails should offer us a way, because it:

makes the assumption that there is the "best" way to do things, and it's designed to encourage that way
but Rails doesn't offer us a way so we are using a hacky DIY thing.

So if all of this is true, where is the Rails solution?

NOTE: This project is inactive, but seems simple enough to be a solution. Why does Rails not have something like this? https://github.com/qertoip/transaction_retry

There's two kinds of deadlocks MySQL generates: Those caused by a "Mexican standoff" situation where you're locking A,B in one place, and B,A in another so neither can get the second half of their lock, and those caused by failing to get a write lock on a heavily contested table. Even though the latter isn't technically a dead-lock, it shows up as one. Which sort do you think you're encountering? Keeping a close eye on your `SHOW PROCESSLIST` is aways a good idea. — tadman, Jan 27 '15 at 21:13
Actually, a bit of both if I understand the documentation correctly. Transaction A gets a shared lock for reading a table. Transaction B asks for an exclusive lock to write to the same table. Transaction A then tries to upgrade to an exclusive lock on the same table it holds a shared lock on. — Mark Bolusmjak, Jan 27 '15 at 21:16
Strangely, the thing that is holding the S lock and waiting for the X lock is a transaction with only an update statement (which I'd expect to go right for an X lock as this is required for updating!). — Mark Bolusmjak, Jan 27 '15 at 21:27
Unless you have designed your operations to be [idempotent](http://en.wikipedia.org/wiki/Idempotence) then retrying queries could cause chaos. This is where having `UNIQUE` indexes to avoid duplication helps considerably. Then you can retry with impunity. I think the reason for not including behaviour like that in the core is to avoid what could be side-effects far more severe than the problem that solution intends to solve. — tadman, Jan 27 '15 at 21:31
Indeed, in the article they solved the problem by ensuring idempotency on the request level, and retrying the request until fulfilled. — Mark Bolusmjak, Jan 27 '15 at 21:38
For tables with a lot of contention, you might want to consider a message queue for jobs that can be retried as many times as necessary to get a reliable commit. This avoids blocking the application. — tadman, Jan 27 '15 at 22:16
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/69698/discussion-between-z5h-and-tadman). — Mark Bolusmjak, Jan 27 '15 at 22:40

score 0 · Accepted Answer · answered Jan 30 '15 at 14:56

The fix, for me, was a better index.

The update in question was in a query with a join, and existing indexes were not sufficient for MySQL to join and search efficiently.

Adding the appropriate index completely removed the deadlock issue even in tests with unreasonably concurrent loads.

What are 'best practices' for dealing with transient database errors in a highly concurrent Rails system?

1 Answers1

Linked