ForkJoinPool for parallel processing

Question

I am trying to run run some code 1 million times. I initially wrote it using Threads but this seemed clunky. I started doing some more reading and I came across ForkJoin. This seemed like exactly what I needed but I cant figure out how to translate what I have below into "scala-style". Can someone explain the best way to use ForkJoin in my code?

val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach{ x =>
    if(x % 10000 == 0) println("got to: "+x)
    val thread = new Thread {
        override def run { 
         //my code (API calls) here. writes to file if call success
        }
    }
}

OK, somebody fill me in. Wouldn't `(1L to 1000000)` be more efficient than applying `map(_.toLong)` after the fact? — jwvh, Jul 28 '15 at 04:38
That is very likely, I started learning Scala today, so my code is definitely not optimized. — Rilcon42, Jul 28 '15 at 04:50

score 1 · Accepted Answer · edited May 23 '17 at 12:06

1

The easiest way is to use par (it will use ForkJoinPool automatically):

 val l = (1 to 1000000) map {_.toLong} toList

 l.par.foreach { x =>
    if(x % 10000 == 0) println("got to: " + x) //will be executed in parallel way
    //your code (API calls) here. will also be executed in parallel way (but in same thread with `println("got to: " + x)`)
 }

Another way is to use Future:

import scala.concurrent._
import ExecutionContext.Implicits.global //import ForkJoinPool

val l = (1 to 1000000) map {_.toLong}

println("running......be patient")

l.foreach { x =>
    if(x % 10000 == 0) println("got to: "+x)
    Future {
       //your code (API calls) here. writes to file if call success
    }
}

If you need work stealing - you should mark blocking code with scala.concurrent.blocking:

Future {
   scala.concurrent.blocking {
      //blocking API call here
   }
}

It will tell ForkJoinPool to compensate blocked thread with new one - so you can avoid thread starvation (but there is some disadvantages).

edited May 23 '17 at 12:06

Community

1
1

answered Jul 28 '15 at 04:24

dk14

22,206
4
51
88

Thanks for the detailed answer! Is there a reason to use `Future` over `ForkJoinPool` in this case if each API call is completely independent from each other? – Rilcon42 Jul 28 '15 at 04:42
1

`ForkJoinPool` is [ThreadPool](https://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html) on which `Future` becomes executed physically. `Future` is an abstraction, used to create tasks for Java's ThreadPools (you can do it manually but future is easier). You can read more about Futures in the internet, but key advantage is that you can extract the result of computation (more-less) easily and in non-blocking way. + you have automatical support of pools (which you don't if you create threads manually) including ForkJoin as particular case (scala chooses it by default). – dk14 Jul 28 '15 at 09:25

score 0 · Answer 2 · answered Jul 28 '15 at 04:26

0

In Scala, you can use Future and Promise:

val l = (1 to 1000000) map {
  _.toLong
}
println("running......be patient")
l.foreach { x =>
  if (x % 10000 == 0) println("got to: " + x)
  Future{
    println(x)
  }
}

answered Jul 28 '15 at 04:26

chengpohi

14,064
1
24
42

ForkJoinPool for parallel processing

2 Answers2