2

I've run some tests:

import Control.Parallel.Strategies
import Data.Vector as V
import Data.Maybe

parMapVec :: (a -> b) -> Vector a -> Vector b
parMapVec f v = runEval $ evalTraversable rpar $ V.map f v

range :: Integer -> Integer -> Vector Integer
range x y
  | x == y = x `cons` empty
  | x < y  = x `cons` (range (x + 1) y)
  | x > y  = (range x (y + 1)) `snoc` y

fac :: Integer -> Integer
fac n
  | n < 2     = 1
  | otherwise = n * (fac $ n - 1)

main :: IO ()
main = do
  let result = runEval $ do
        let calc = parMapVec fac $ 80000 `range` 80007
        rseq calc
        return calc
  putStrLn $ show result

As well as with the following modification to main to make sure that my parMapVector wasn't what was wrong:

main = do
  let result = runEval $ do
        let calc = parMap rpar fac [80000..80007]
        rseq calc
        return calc
  putStrLn $ show result

I compiled with gch --make parVectorTest.hs -threaded -rtsopts and ran both with ./parVectorTest -s.

Here's what I found with the version with vectors:

56,529,547,832 bytes allocated in the heap
10,647,896,984 bytes copied during GC
    7,281,792 bytes maximum residency (16608 sample(s))
    3,285,392 bytes maximum slop
            21 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     82708 colls,     0 par    0.828s   0.802s     0.0000s    0.0016s
Gen  1     16608 colls,     0 par   15.006s  14.991s     0.0009s    0.0084s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 8 (7 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.368s  (  5.369s elapsed)
GC      time   15.834s  ( 15.793s elapsed)
EXIT    time    0.001s  (  0.000s elapsed)
Total   time   21.206s  ( 21.163s elapsed)

Alloc rate    10,530,987,847 bytes per MUT second

Productivity  25.3% of total user, 25.4% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

So that's good, except that I watched the process execute on my system monitor, and only one core was working at a time. Every time one of the results was printed out, the process would switch to a different core. So I thought it was something wrong with my parMapVec function. But then I did the same thing except with the version with lists:

56,529,535,488 bytes allocated in the heap
12,483,967,024 bytes copied during GC
    6,246,872 bytes maximum residency (19843 sample(s))
    2,919,544 bytes maximum slop
            20 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     79459 colls,     0 par    0.818s   0.786s     0.0000s    0.0009s
Gen  1     19843 colls,     0 par   17.725s  17.709s     0.0009s    0.0087s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 16 (14 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.394s  (  5.400s elapsed)
GC      time   18.543s  ( 18.495s elapsed)
EXIT    time    0.000s  (  0.000s elapsed)
Total   time   23.940s  ( 23.896s elapsed)

Alloc rate    10,479,915,927 bytes per MUT second

Productivity  22.5% of total user, 22.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

So there was more garbage collection, which makes sense. And there was also more sparks, which I don't know how to explain. This program exhibited the same behavior when I watched it execute on my system monitor.

I also ran both tests with ./parVector -s -C0.01 because of the answer to this question and got basically the same results. I'm on a Lenovo Ideapad, 8 cores, running Ubuntu Linux 17.04. At the time of the tests, the only apps I had open were VS Code and my system monitor, although there other processes taking up a very small portion of the processing power. Does a processor have to be completely idle to take a spark?

Eben Kadile
  • 759
  • 4
  • 21
  • 6
    This seems a bit vague. Is the only issue that only a single hardware thread is being used? If so this is due to the distinction between parallelism and concurrency. Note the text "using -N1" in your output. Pass e.g. `+RTS -N8` to your program – user2407038 Dec 23 '17 at 22:27
  • Thanks, this fixed it. You might as well turn this into an answer. – Eben Kadile Dec 24 '17 at 00:57

1 Answers1

6

By default, GHC runs all programs using a single OS thread, even with -threaded enabled. Note the text "using -N1" in your output - it indicates that the program is being run with 1 physical thread.

In short: pass e.g. +RTS -N8 to your program. For documentation of this flag, see here.


In a broad sense, this is due to the distinction between parallelism and concurrency. Here are some SO questions which try to explain the difference. The difference can be summarized as:

  • parrallelism: a task subdivided into similar chunks to run simultaneously on separate cores/CPUs at some point in time; for increased speed

  • concurrency: several tasks being executed conceptually independently such that their execution times overlap, whether on the same thread through time slicing or on separate cores/CPUs; usually utilizing shared resources more efficiently

However, these definitions are somewhat contentious; sometimes the two have opposite meanings, and sometimes they are used interchangeably. However, for the purpose of understanding this problem (why you must pass another flag in addition to -threaded to make a 'parallel' program actually run in parallel) I believe they are useful definitions.

user2407038
  • 14,400
  • 3
  • 29
  • 42
  • I googled "distinction between parallelism and concurrency" and the two top entries' quotations on the search results page seem to contradict each other. The third entry, Quora's, 2nd answer says two terms are interchangeable. Could you perhaps clarify the issue? Give links to resources that you approve of? Something? Thanks. – Will Ness Dec 24 '17 at 17:04
  • I'm afraid I can't find any such contradiction in the google results; nor any claim they are interchangeable terms. However, I've included some links in the answer (which agree with my understanding). – user2407038 Dec 24 '17 at 18:02
  • 2
    I guess things are somewhat complicated by the fact that the Haskell package in question is called `parallel` and claims to implement "parallel programming", when it really implements concurrent programming - indeed with only one physical thread, parallelism is by definition impossible. But I suppose the intent of `parallel` and libraries like it is to define a pure interface for parallelism, which, on a denotational level, is only implementing concurrency, but is intended to be used with `-N`. – user2407038 Dec 24 '17 at 18:08
  • I don't see what `parallel` has to do with concurrent programming. Concurrent programming features show up in `base`, `stm`, and `async`. `parallel` is all about parallelism, as best I can tell. – dfeuer Dec 24 '17 at 18:45
  • link-only answers, without any summary of their contents, are frowned upon. I'm not referring to the whole of your answer of course, but to the part with the link. It would really help if you could provide some short description, is all I'm saying. – Will Ness Dec 24 '17 at 19:44
  • @WillNess Sure, but the first paragraph isn't really important to the answer; nor can I see a way to summarize the distinction (or what I believe it to be) better than [this](https://stackoverflow.com/questions/1050222/concurrency-vs-parallelism-what-is-the-difference/1050257#1050257) answer. Maybe I should just quote it? Given how contentious this section of the answer has been, I'm inclined to just delete it (but this leaves a very confusing comment train). – user2407038 Dec 24 '17 at 20:36
  • @user2407038 I was bold and made an edit. :) Perhaps it'll serve as a starting point for your further edit, correcting the wording or something, or you will want to revert it altogether. :) Hope this helps. – Will Ness Dec 24 '17 at 21:09
  • 1
    @WillNess Thanks a lot for your help. I think I've finally got this answer in a place where I'm satisfied with it. Notably I've removed the language " a very well explored topic " which seems to imply that it is a trivial topic; which we've discovered it most certainly isn't.... – user2407038 Dec 25 '17 at 16:12