I recently found out about Parallel Collection in Scala 2.9 and was excited to see that the degree of parallelism can be set using collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism.
However when I tried an experiment of adding two vectors of size one million each , I find
- Using parallel collection with parallelism set to 64 is as fast as sequential (Shown in results).
- Increasing setParallelism seems to increase performance in a non-linear way. I would have atleast expected monotonic behaviour (That is performance should not degrade if I increase parallelism)
Can some one explain why this is happening
object examplePar extends App{
val Rnd = new Random()
val numSims = 1
val x = for(j <- 1 to 1000000) yield Rnd.nextDouble()
val y = for(j <- 1 to 1000000) yield Rnd.nextDouble()
val parInt = List(1,2,4,8,16,32,64,128,256)
var avg:Double = 0.0
var currTime:Long = 0
for(j <- parInt){
collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(j)
avg = 0.0
for (k <- 1 to numSims){
currTime = System.currentTimeMillis()
(x zip y).par.map(x => x._1 + x._2)
avg += (System.currentTimeMillis() - currTime)
}
println("Average Time to execute with Parallelism set to " + j.toString + " = "+ (avg/numSims).toString + "ms")
}
currTime = System.currentTimeMillis()
(x zip y).map(x => x._1 + x._2)
println("Time to execute using Sequential = " + (System.currentTimeMillis() - currTime).toString + "ms")
}
The results on running the example using Scala 2.9.1 and a four core processor is
Average Time to execute with Parallelism set to 1 = 1047.0ms
Average Time to execute with Parallelism set to 2 = 594.0ms
Average Time to execute with Parallelism set to 4 = 672.0ms
Average Time to execute with Parallelism set to 8 = 343.0ms
Average Time to execute with Parallelism set to 16 = 375.0ms
Average Time to execute with Parallelism set to 32 = 391.0ms
Average Time to execute with Parallelism set to 64 = 406.0ms
Average Time to execute with Parallelism set to 128 = 813.0ms
Average Time to execute with Parallelism set to 256 = 469.0ms
Time to execute using Sequential = 406ms
Though these results are for one run, they are consistent when averaged over more runs