Haskell: TVar: Preventing starvation

Question

I'm considering using a TVar to store some state in a web application (that can be recreated on restart). However, the contention aspects of TVar concern me. It seems that a frequent short running transaction can starve out longer transactions by continually interrupting them. Also, as more longer running transactions keep restarting, this would increase load on the CPU, tending to further increase the length of these transactions. Eventually I feel this could cause the server to become completely unresponsive.

Considering this, I have these questions:

(1) Can TVar (or another data type) use locks, not simultaneous attempts/retries.

(2) Can TVar (or another data type) have some different contention mechanism, i.e. "let transactions run for a second before running another transaction", or at least some guarantee that transactions will eventually complete (i.e. a contention algorithm that prevents starvation for longer running transactions).

Note `retry` doesn't restart the transaction immediately; transactions only retry when the `TVar`s they use are modified. — ehird, Apr 11 '12 at 05:23
@ehird: Don't transactions automatically retry immediately if another transaction writes to the TVar they previously read (even without a call to `retry`)? — Clinton, Apr 11 '12 at 05:33
@Clinton: Yes they retry, but only after the runtime system detected a possibility of a different outcome of the transaction. I.e. it waits till one the TVars read before the retry have changed. Which turns busy waiting into a push notification scheme. — jmg, Apr 11 '12 at 05:47
@jmg: If another transaction writes to the TVar it has read, doesn't it retry immediately (as the TVar has changed)? — Clinton, Apr 11 '12 at 05:53
@Clinton: But only if another transaction wrote into such a TVar. Otherwise it waits until that happens. And I think the transaction is restarted after the other finished. Otherwise there would be a conflict immediately. — jmg, Apr 11 '12 at 05:56
@jmg: "But only if another transaction wrote into such a TVar". That's my point. This would be trivial if there were only read transactions and write transactions without contention. The bad behaviour seems to occur when many transactions are being applied to one TVar. As contention increases, CPU increases, which further increases contention. Also, a variable only has to be used 1% of the time for this problem to occur, if a variable is accessed for 0.0001 seconds every 0.01 seconds, a 0.1 second long transaction (one that needs to read, do some processing and write) will never complete. — Clinton, Apr 11 '12 at 06:01
@Clinton: Livelock is always a concern (but seldom an actual problem), but avoidance strategies are very concrete. You'll need to specify what transactions you intend to happen with what sort of constraints before an effectie answer can be given. — sclv, May 20 '12 at 18:52

Peter · Answer 1 · 2012-05-19T04:04:31.657

5

I don't think there's a way to guarantee starvation freedom, unless you change the runtime code of the STM system itself. In my opinion, bringing in locks to avoid contention among TVars defeats the purpose of having STM in the first place, since the whole point of using STM is to get rid of the classic error-prone lock-based approach to concurrent programming.

Sure, starvation might cause significant performance loss, but only under the assumption that such large transactions are actually necessary. One design principle that I try to keep in mind, is to use TVars at a low granularity level. For example, instead of putting an entire Data.Map into a TVar, which might cause contention everytime an entry is updated, you can use a more STM-friendly data structure, like skiplists [1].

[1] http://hackage.haskell.org/package/tskiplist

edited May 19 '12 at 04:04

answered Apr 11 '12 at 05:35

Peter

1,693
13
18

I don't really understand how STM is less error-prone. Don't frequent short running transactions effectively place a lock that is never released for longer running transactions? Isn't that more error-prone as under a lock based system, at least at some point the longer running transaction will grab the lock and complete? STM seems to force the programmer to ensure that their longest running transaction takes less time to complete than the longest gap between transactions. Perhaps I'm missing something here though. – Clinton Apr 11 '12 at 05:43
If I understand correctly, under a lock system, if a transaction is currently progressing, another transaction needs to wait. Under STM, if a transaction is currently progressing, instead it begins to do work it will eventually scrap anyway. In both cases no useful work gets done, so I don't see how STM is less "locking" than ordinary locks. But again, perhaps I'm missing something. – Clinton Apr 11 '12 at 05:57
@Clinton: The primary problem with locks is not starvation, but to few locking or deadlocks. So, problems with locks are usually correctness not performance problems. Whereas problems with STM might be performance problems. You should also consider the possibilities of laziness. Laziness makes it possible to write the result of a long running pure computation in a short running transaction into a TVar. – jmg Apr 11 '12 at 05:59
@Clinton: Concurrency is not about how much locking happens. With manual locks you can have the "right amount" of locking but still run into deadlocks when you combine modules or libraries. Whereas STM is composable. You should have look at this: http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/stm.pdf – jmg Apr 11 '12 at 06:00
@jmg: I've read through that paper before, and I'm not really sure about the logic. If I have a function `f` that has a transaction that runs for 0.01 seconds for 0.00001 seconds, and another function `g` that has a transaction that uses the same `TVar` that runs every 10 seconds for 0.1 seconds, I can't compose the two, `f` will "lock out" `g`. I could compose the two using locks however (although, `f` would be slightly delayed at times). It seems locks are more composable than STM. – Clinton Apr 11 '12 at 06:08
@Clinton: I've posted my answer as a separate answer. – jmg Apr 11 '12 at 06:17
@Clinton: strictly speaking that's not "composition", that's just running two transactions in the same program. Yes the situation you describe could lead to starvation, but in practice it rarely occurs. Good practice is to make all your transactions short. If you have high contention for a particular piece of shared state, then you might do better with something other than STM. (also scheduling strategies to avoid starvation is an active research area) – Simon Marlow May 18 '12 at 20:25

sclv · Accepted Answer · 2012-05-21T01:13:26.307

This is only a concern if you have many cheap transactions that update data and a few expensive ones that read it. Analytics on a live-updated dataset, perhaps.

If you're actually concerned about this, consider using a flag TVar. Set it to false and check that it is false at the start of every cheap transaction, calling retry otherwise. Then simply set it to true before entering into your longrunning transaction, and set it to false on exit. Alternately, you can simply guard your state behind a TMVar. Your longrunning computation takes the tmvar atomically, does what it feels like, and then returns it. The other transactions take place entirely within a single actual STM transaction.

Remember also, a long running STM transaction is sort of a tricky beast. Due to laziness, you can put an expensive value into a var cheaply. You can also read a "snapshot" of data from a whole bunch of vars very quickly. To have a genuinely long-running transaction you need to read from a whole bunch of vars, then based on what you've read, compute what vars you are going to write new values into (or read values from), and that computation itself must be expensive. So likely you're not even in that scenario to begin with!

score 0 · Answer 3 · answered Apr 11 '12 at 06:17

This was written as a comment to one of Clinton's comments on Peters answer. But it became to long.

Consider, you have two bank accounts: A and B. Each is protected by its own lock. Now, you have two transaction the first transfers money from A to B, and the second from B to A. Each takes first the lock of the source account and then of the destination account, transfers the money and releases the locks. If your unlucky the two transactions will run into a dead lock, and nothing will ever be done with those two accounts. If you do that in STM they will run after each other. If you have infinite many of the first kind, they might starve the second transaction. But you still get a lot done. While with locking nothing ever happens.

STM guarantees that there are no data races with TVars! None what so ever. With locking you can arrive at that conclusion after very careful inspection of your code. And every line you add might invalidate your conclusion completely.

Haskell: TVar: Preventing starvation

3 Answers3