Semi-explicit parallelism in Haskell

Question

I am reading semi-explicit parallelism in Haskell, and get some confusion.

      par :: a -> b -> b

People say that this approach allows us to make automatically parallelization by evaluating every sub-expression of a Haskell program in parallel. But this approach has the following disadvantages:

1) It creates far too many small items of the work, which cannot be efficiently scheduled. As I understand, if you use par function for every lines of Haskell program, it will creates too many threads, and it's not practical at all. Is that right?

2) With this approach, parallelism is limited by data dependencies in the source program. If I understand correctly, it means every sub-expression must be independent. Like, in the par function, a and b must be independent.

3) The Haskell runtime system does not necessarily create a thread to compute the value of the expression a. Instead, it creates a spark, which has the potential to be executed on a different thread from the parent thread.

So, my question is : finally the runtime system will create a thread to compute a or not? Or if the expression a is needed to compute the expression b, the system will create a new thread to compute a? Otherwise, it will not. Is this true?

I am a newbie to Haskell, so maybe my questions are still basic for all of you. Thanks for your answer.

jev · Accepted Answer · 2013-09-02T13:38:50.390

The par combinator you mention is part of the Glasgow parallel Haskell (GpH) which implements semi-explicit parallelism, which however means it is not fully implicit and hence does not provide automatic parallelisation. The programmer still needs to identify subexperssions deemed worthwhile executing in parallel to avoid the issue you mention in 1).

Moreover, the annotation is not prescriptive (as e.g. pthread_create in C or forkIO in Haskell) but advisory, that means the runtime system finally decides whether to evaluate the subexpressions in parallel or not. This provides additional flexibility and a way for dynamic granularity control. Additionally, so-called Evaluation Strategies have been designed to abstract over par and pseq and to separate specification of coordination from computation. For instance, a strategy parListChunk allows to chunk a list and force it to weak head normal form (this is a case where some strictness is needed).

2) Parallelism is limited by data dependencies in the sense that the computation defines the way the graph is reduced and which computation is demanded at which point. It is not true that every sub-expression must be independent. For instance E1 par E2 returns the result of E2, it means to be useful, some part of E1 needs to be used in E2 and hence E2 depends on E1.

3) The picture is slightly confused here because of the GHC-specific terminology. There are Capabilities (or Haskell Execution Contexts) which implement parallel graph reduction and maintain a spark pool and a thread pool each. Usually there is one Capability per core (can be thought of as OS threads). On the other end of the continuum there are sparks, which are basically pointers to parts of the graph that have not been evaluated yet (thunks). And there are threads (actually sort of tasks or work units), so that to be evaluated in parallel a spark needs to be turned into a thread (which has a so called thread state object that contains the necessary execution environment and allows a thunk to be evaluated in parallel). A thunk may depend on results of other thunks and blocks until these results arrive. These threads are much more lightweight than OS threads and are being multiplexed onto the available Capablities.

So, in summary, a runtime will not even neccesarily create a lightweight thread to evaluate a sub-expression. By the way, random work-stealing is used for load-balancing.

This is a very high-level approach to parallelism and avoids race conditions and deadlocks by design. The synchronisation is implicitly mediated through graph reduction. A nice position statement discusses further Why Parallel Functional Programming Matters. For more information on the abstract machine behind the scenes have a look at Stackless Tagless G-Machine and checkout the notes on Haskell Execution Model on GHC wiki (usually the most up-to-date documentation alongside the source code).

I think I do not really understand what you mean by graph reduction — chipbk10, Sep 01 '13 at 21:24
lets take the expression `1 + (inc 2)` where inc is the increment function, then the computation can be represented by a tree with operations or function application being nodes and with simple values at each leaf. because haskell avoids work duplication some nodes may point to other sub-trees, so we have a graph. graph reduction is then the process of applying the function to operators to reduce the graph to the resulting value. e.g. 1 + (inc 2) -> 1 + 3 -> 4; this video may help to visualise this: http://www.youtube.com/watch?v=ZebxyrCb1ug — jev, Sep 01 '13 at 21:29
above it read: *apply functions to arguments*. graph reduction is a bit misleading since the graph may actually grow in the process but will finally be reduced to a result. (the name has to do with operations defined by lamda calculus on which FP is based) — jev, Sep 01 '13 at 21:39

kqr · Answer 2 · 2013-09-01T21:17:59.907

Yes, you are correct. You would not gain anything by creating a spark for every expression you want computed. You would get way, way too many sparks. Trying to manage this is what Data Parallel Haskell is about. DPH is a way of breaking down a nested computations into well-sized chunks which can then be computed in parallel. Keep in mind that this is still a research effort and probably not ready for mainstream consumption.
Once again, you are correct. If a depends on b you have to compute as much of a as b needs to be able to start computation of b.
Yup. Threads actually have a pretty high overhead compared to some of the alternatives. Sparks are somewhat like thunks only they can be computedd independently of time.

No, the RTS will not create a thread to compute a. You can decide how many threads the RTS should have running (+RTS -N6 for six threads) and they will be kept alive for the duration of the program.

par only creates a spark. A spark is not a thread. The sparks occupy a work pool, and the scheduler performs work stealing – i.e. when a thread goes idle it picks up a spark from the pool and computes it.

Well, sparking means the main thread will create a task, and then put this task into a pool of work. If RTS has 6 threads, and one of them is available, this thread will pick up a task in the pool to evaluate, right? By the way, "evaluating" is equivalent to "computing" or "evaluating" just measure the time to perform, not really compute it? — chipbk10, Sep 01 '13 at 20:29
Evaluating is the same thing as computing. Evaluating is what you'll hear most FP people call it, but I tend to call it computing when I speak to new Haskellers. Sometimes I mess up and say both in the same sentence. — kqr, Sep 01 '13 at 21:17

Semi-explicit parallelism in Haskell

2 Answers2