Haskell concurrency using simple IORef?

Question

I've been asking a few questions about concurrency in Haskell, particular TVar, and I've had concerns about livelock with TVar.

Instead, I've proposed this solution.

(1) Wrap all shared data in the program in one data structure, and wrap that in an IORef. (2) Simply do any changes using atomicModifyIORef.

I believe this prevents both deadlocks and livelocks (whereas TVar only prevents the former). Also, because atomicModifyIORef simply links another thunk into a chain (which is a couple of pointer operations) this is not a bottl neck. All of the actual operations on the data can be done in parallel, as long as they do not mutually depend on each other. The Haskell runtime system will work this out.

I however feel like this is too simple. Are there any "gotchas" I've missed?

Why don't you just use STM? It is much simpler and works very well. — Gabriella Gonzalez, Apr 12 '12 at 02:13
Why is STM simpler than IORef? No need to worry about livelock with IORef, which makes IORef simpler yes? — Clinton, Apr 12 '12 at 02:16
I'm pretty sure STM doesn't lead to livelocks, but I may be mistaken. — Gabriella Gonzalez, Apr 12 '12 at 03:31
@Gabriel Gonzalez: Consider transactions that takes 1ms to run continually run every 100ms. Any transaction that takes 200ms to run on that variable will never complete, because it will continually get conflicts when it goes to commit. It will keep retrying, keeping the server busy doing useless work. — Clinton, Apr 12 '12 at 03:45
"All of the actual operations on the data can be done in parallel, as long as they do not mutually depend on each other." <- I can pretty much guarantee that any interesting concurrent program does not have the property in the second clause. You will eventually need to deal with conflicts, at which point you will wish you had used STM or MVars in the first place. — Daniel Wagner, Apr 12 '12 at 04:03
Daniel: Could you show an example where using `atomicModifyIORef` on a single `IORef` causes any conflicts that need to be "dealt" with? — Clinton, Apr 12 '12 at 04:11
Have you actually encountered an STM situation where this actually occurs? You are assuming a specific STM implementation and I'm pretty sure the GHC developers wrote a smart implementation. — Gabriella Gonzalez, Apr 12 '12 at 04:16
@Gabriel Gonzalez: http://stackoverflow.com/questions/6439925/poor-performance-lockup-with-stm I'm quite sure I read about livelocks also in the STM academic papers. In any case, my question is whether my IORef approach has any issues, not the STM approach. The IORef approach I believe is as parallel as the STM approach, I'm looking for evidence that it is not, or that there is some other issue with it (i.e. deadlocks). — Clinton, Apr 12 '12 at 04:27
@Gabriel Gonzalez: Also, here's other example that illustrates a livelock: http://stackoverflow.com/questions/3031878/stm-monad-problem — Clinton, Apr 12 '12 at 04:29
Also, I believe `IORef` is standard haskell, whereas `TVar` is a GHC extension. Unless there's a particular reason why `IORef` doesn't work in parallel as well as `TVar`, I see no reason to use `TVar`. My question is if there is a particular problem with `IORef`, what is it? — Clinton, Apr 12 '12 at 04:32

John L · Answer 1 · 2012-04-12T12:25:46.520

This design would probably be ok if the following are true:

reads will be much more prevalent than writes
a number of reads will be interspersed between writes
(possibly) writes will affect only a small portion of the global data structure

Of course, given those conditions, pretty much any concurrency system would be fine. Since you're concerned about livelock, I suspect you're dealing with more complicated access pattern. In which case, read on.

Your design appears to be guided by the following chain of reasoning:

atomicModifyIORef is very cheap, because it just creates thunks
because atomicModifyIORef is cheap, it's not going to cause thread contention
Cheap data access + no contention = Concurrency FTW!

Here's the missing step in this reasoning: your IORef modifications only create thunks, and you have no control over where thunks are evaluated. If you can't control where the data is evaluated, you have no real parallelism.

Since you haven't yet presented the intended data access patterns this is speculation, however I expect that what will happen is that your repeated modifications to the data will build up a chain of thunks. Then at some point you'll read from the data and force an evaluation, causing all of those thunks to be evaluated sequentially in one thread. At this point, you may as well have written single-threaded code to begin with.

The way around this is to ensure that your data is evaluated (at least as far as you would like it to be) before it's written into the IORef. This is what the return parameter of atomicModifyIORef is for.

Consider these functions, meant to modify aVar :: IORef [Int]

doubleList1 :: [Int] -> ([Int],())
doubleList1 xs = (map (*2) xs, ())

doubleList2 :: [Int] -> ([Int], [Int])
doubleList2 xs = let ys = map (*2) xs in (ys,ys)

doubleList3 :: [Int] -> ([Int], Int)
doubleList3 xs = let ys = map (*2) xs in (ys, sum ys)

Here's what happens when you use these functions as arguments:

!() <- atomicModifyIORef aVar doubleList1 - only a thunk is created, no data is evaluated. An unpleasant surprise for whichever thread reads from aVar next!
!oList <- atomicModifyIORef aVar doubleList2 - the new list is evaluated only so far as to determine the initial constructor, that is (:) or []. Still no real work has been done.
!oSum <- atomicModifyIORef aVar doubleList3 - by evaluating the sum of the list, this guarantees that computation is fully evaluated.

In the first two cases, there's very little work being done so the atomicModifyIORef will exit quickly. But that work wasn't done in that thread, and now you don't know when it will happen.

In the third case, you know the work was done in the intended thread. First a thunk is created and the IORef updated, then the thread begins to evaluate the sum and finally returns the result. But suppose some other thread reads the data while the sum is being calculated. It may start evaluating the thunk itself, and now you've got two threads doing duplicate work.

In a nutshell, this design hasn't solved anything. It's likely to work in situations where your concurrency problems weren't hard, but for extreme cases like you've been considering, you're still going to be burning cycles with multiple threads doing duplicate work. And unlike STM, you have no control over how and when to retry. At least STM you can abort in the middle of a transaction, with thunk evaluation it's entirely out of your hands.

Points 2. and 3. are not entirely accurate. Neither causes any evaluation of the list unless you evaluate the result. `atomicModifyIORef` is quite non-strict; even `atomicModifyIORef var undefined` won't throw anything until you either force the result or the value of the IORef. — hammar, Apr 12 '12 at 11:25
@hammar - that's a good point. I was trying to avoid cluttering the examples with bang patters and/or seqs, but as a result got the logic wrong. Cutting corners doesn't pay. In the end, this might be even stronger support for the argument that relying on thunk updates for concurrency doesn't scale or compose. — John L, Apr 12 '12 at 12:27
@John L: Why multiple threads doing duplicate work? Can't I easily blackhole thunks just by doing "-feager-blackholing" so they are only evaluated once? Also, when will there need to be retries? — Clinton, Apr 12 '12 at 14:41
@Clinton - I meant retry as shorthand for abort or retry. Consider: thread A updates the IORef and starts evaluating, and the result won't actually be used here. Thread B updates the IORef again. Now thread A will continue to evaluate its stale data, wasting cycles. With STM you may be able to avoid this extra work. — John L, Apr 12 '12 at 15:51
Also "-feager-blackholing" will help, but there is a lot of supposition here. Without any problem description (after 7 questions!), people are just coming up with ways any particular scheme will fail. This is architecture by blind committee; you should try some code and take measurements. — John L, Apr 12 '12 at 16:03
@Clinton - who knows? We've got no idea what your architecture or data is. — John L, Apr 12 '12 at 16:03
@John L: Can you give an example where `STM` will avoid extra work that can not be avoided using `IORef`s? — Clinton, Apr 12 '12 at 16:09

score 5 · Answer 2 · answered Apr 12 '12 at 00:19

5

Well, it's not going to compose well. And serializing all of your shared memory modifications through a single IORef will mean that only one thread will be able to modify shared memory at a time, all you've really done is made a global lock. Yes it will work, but it will be slow and nowhere near as flexible as TVars or even MVars.

answered Apr 12 '12 at 00:19

dented42

338
2
8

dented42: Isn't only creating the thunks serialized? Can't they be executed in parallel? – Clinton Apr 12 '12 at 01:30
3

@Clinton - you have no control over when thunks will be evaluated, other than to specify that evaluation must occur no later than a certain point in the control flow. Trying to ensure thunks are evaluated in parallel would be near-impossible to set up and debug. – John L Apr 12 '12 at 09:22
If the transactions are guaranteed to be atomic then they can't be allowed to interfere with each other, which is done by only letting a single atomic block of code run at a time. If multiple 'atomic' thunks ran at the same time then the compiler wouldn't be able to guarantee that they interfere with each other.And as far as when the thunks will be evaluated, the compiler/runtime only promises to evaluate it before you need it's value. – dented42 Apr 14 '12 at 08:22

score 1 · Answer 3 · answered Apr 12 '12 at 02:11

1

AFAICT if your computation leaves un-evaluated thunks after it does its thing with the IORef contents, that thunk will simply be evaluated in whatever thread tries to use the result, rather than being evaluated in parallel as you would like. See the gotchas section of MVar docs, here

It might be more interesting and helpful for others if you provided a concrete problem that you're trying to solve (or a simplified, but similar one).

answered Apr 12 '12 at 02:11

jberryman

16,334
5
42
83

jberryman: If two threads try to use two different results, won't they be evaluated in parallel? Of course if there's only one thread it will be done serially, but that's trivial and doesn't explain anything does it? – Clinton Apr 12 '12 at 02:15
In this case, my usage is a webserver (using Yesod). So it's naturally multithreaded. I want to ensure that changes to my data are also multithreaded. What I'm proposing here is that with IORef they are, because the only thing that is serialized is the creation of thunks, which is a comparatively small amount of the workload. – Clinton Apr 12 '12 at 02:18
@Clinton: It depends on the dependencies between the two results. If they are completely independent, then they should be evaluated in parallel. If not, then any part that both depend on will be evaluated by one thread, possibly blocking the other thread if it needs that result at the same time. It's very difficult to say anything about how efficient this will be without a concrete example, so I recommend you try writing a prototype of what you were thinking of, and analyze its performance using [ThreadScope](http://www.haskell.org/haskellwiki/ThreadScope). – hammar Apr 12 '12 at 03:03
@hammer: Of course, I understand that in `f(g(x))` you can't do `f` and `g` at the same time. What I want to ensure is that when running `h(f(x1), g(x2))`, `f` and `g` can run at the same time, even if `x1` and `x2` are part of an `IORef`. If they can, then `IORef` solves the problems of livelock that exists with `TVar`. – Clinton Apr 12 '12 at 03:17
@Clinton: it does prevent a livelock, though if this solves your theoretical problem with the STM it's unlikely to actually be a problem. These two solutions are actually quite similar: `atomicModifyIORef` and `atomically . modifyTVar`. IORefs do have lower overhead if you find them savory. – Nathan Howell Apr 12 '12 at 07:37

Haskell concurrency using simple IORef?

3 Answers3