1

I am using scala parallel collections.

val largeList = list.par.map(x => largeComputation(x)).toList

It is blazing fast, but I have a feeling that I may run into out-of-memory issues if we run too may "largeComputation" in parallel.

Therefore when testing, I would like to know how many threads is the parallel collection using and if-need-be, how can I configure the number of threads for the parallel collections.

Knows Not Much
  • 30,395
  • 60
  • 197
  • 373
  • Did you read this piece of [documentation](http://docs.scala-lang.org/overviews/parallel-collections/performance)? In particular the section "How big should a collection be to go parallel?" – hasumedic Aug 01 '16 at 14:52
  • I saw it, but it wasn't clear to me what are they doing. I know we need to do something with the ForkJoinTaskSupport thing... but what exactly is it? – Knows Not Much Aug 01 '16 at 15:11

1 Answers1

2

Here is a piece of scaladoc where they explain how to change the task support and wrap inside it the ForkJoinPool. When you instantiate the ForkJoinPool you pass as the parameter desired parallelism level:

Here is a way to change the task support of a parallel collection:

import scala.collection.parallel._
val pc = mutable.ParArray(1, 2, 3)
pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2))

So for your case it will be

val largeList = list.par
largerList.tasksupport = new ForkJoinTaskSupport(
  new scala.concurrent.forkjoin.ForkJoinPool(x)
)
largerList.map(x => largeComputation(x)).toList
Alexander Arendar
  • 3,365
  • 2
  • 26
  • 38