Asynchronous IO in Scala with futures

Question

Let's say I'm getting a (potentially big) list of images to download from some URLs. I'm using Scala, so what I would do is :

import scala.actors.Futures._

// Retrieve URLs from somewhere
val urls: List[String] = ...

// Download image (blocking operation)
val fimages: List[Future[...]] = urls.map (url => future { download url })

// Do something (display) when complete
fimages.foreach (_.foreach (display _))

I'm a bit new to Scala, so this still looks a little like magic to me :

Is this the right way to do it? Any alternatives if it is not?
If I have 100 images to download, will this create 100 threads at once, or will it use a thread pool?
Will the last instruction (display _) be executed on the main thread, and if not, how can I make sure it is?

Thanks for your advice!

score 137 · Accepted Answer · answered Oct 27 '12 at 11:04

137

Use Futures in Scala 2.10. They were joint work between the Scala team, the Akka team, and Twitter to reach a more standardized future API and implementation for use across frameworks. We just published a guide at: http://docs.scala-lang.org/overviews/core/futures.html

Beyond being completely non-blocking (by default, though we provide the ability to do managed blocking operations) and composable, Scala's 2.10 futures come with an implicit thread pool to execute your tasks on, as well as some utilities to manage time outs.

import scala.concurrent.{future, blocking, Future, Await, ExecutionContext.Implicits.global}
import scala.concurrent.duration._

// Retrieve URLs from somewhere
val urls: List[String] = ...

// Download image (blocking operation)
val imagesFuts: List[Future[...]] = urls.map {
  url => future { blocking { download url } }
}

// Do something (display) when complete
val futImages: Future[List[...]] = Future.sequence(imagesFuts)
Await.result(futImages, 10 seconds).foreach(display)

Above, we first import a number of things:

future: API for creating a future.
blocking: API for managed blocking.
Future: Future companion object which contains a number of useful methods for collections of futures.
Await: singleton object used for blocking on a future (transferring its result to the current thread).
ExecutionContext.Implicits.global: the default global thread pool, a ForkJoin pool.
duration._: utilities for managing durations for time outs.

imagesFuts remains largely the same as what you originally did- the only difference here is that we use managed blocking- blocking. It notifies the thread pool that the block of code you pass to it contains long-running or blocking operations. This allows the pool to temporarily spawn new workers to make sure that it never happens that all of the workers are blocked. This is done to prevent starvation (locking up the thread pool) in blocking applications. Note that the thread pool also knows when the code in a managed blocking block is complete- so it will remove the spare worker thread at that point, which means that the pool will shrink back down to its expected size.

(If you want to absolutely prevent additional threads from ever being created, then you ought to use an AsyncIO library, such as Java's NIO library.)

Then we use the collection methods of the Future companion object to convert imagesFuts from List[Future[...]] to a Future[List[...]].

The Await object is how we can ensure that display is executed on the calling thread-- Await.result simply forces the current thread to wait until the future that it is passed is completed. (This uses managed blocking internally.)

answered Oct 27 '12 at 11:04

Heather Miller

3,901
1
23
19

Thanks for the in-depth answer! If I understand correctly, if you don't specify "blocking", then the thread pool can potentially run out of workers and block forever if every worker stays busy indefinitely? Also, can I create my own `ExecutionContext` to force the completion callback (but not the actual background process, of course) to be executed asynchronously on a specific thread (I.e. the UI thread, using a framework-specific method)? – F.X. Oct 28 '12 at 09:34
Technically: Avoid blocking at all costs. Only do blocking if you have no other choice. – Viktor Klang Oct 28 '12 at 14:45
In my understanding, network calls _are_ blocking, are they not? If each network call has a timeout, would that count as blocking as well? – F.X. Oct 28 '12 at 19:48
Network calls would technically not be blocking, if you were to use an AsyncIO library like Java's NIO. – Heather Miller Oct 29 '12 at 08:38
For Futures, there is no reason for multiple futures with timeouts not to be handled in a non-blocking way. – Heather Miller Oct 29 '12 at 08:51
1

But to answer your earlier question- yes, the default FJPool can run out of workers if all threads block and you do not use managed blocking. And yes, you can create your own `ExecutionContext`, using Swing's `invokeLater`, for example, and explicitly pass that to the `foreach` on `futImages` instead of using the `Await.result` – Heather Miller Oct 29 '12 at 08:58
Thanks, that's what I wanted to know ;) I'll play around with all that stuff and see what I can do with it! – F.X. Oct 30 '12 at 02:48
I'm a noob in this and I have a question - when do the imageFuts futures start executing? I suppose not in the map command, because you are attaching the "listeners" *after* this starts executing? – User Oct 10 '13 at 10:18
2

How does actually the stuff behind `blocking` works? Does it have its own thread pool or it simply creates new thread when we submit a task via `blocking`? – maks Nov 26 '13 at 15:08
4

why are you using blocking in there `url => future { blocking { download url } }`, why not to use just `url => future { download url }`? – Incerteza Dec 02 '13 at 01:29
By the way, implicit managed blocking solution for `Await` isn't cool in practice. It allowed very bad and stupid architectural solution (with blocking inside actors) in our project and caused serious performance leaks - http://stackoverflow.com/questions/28044971/how-to-guarantee-sequentiality-for-forks-in-akka. As a result our routing framework worked only with fj-pool and were creating about 5-6 threads per new message (in case of high-load). And thanks to fj the guy (architect) who did it don't even remember his mistake as it was so easy to do it. – dk14 Feb 18 '15 at 04:56

score 5 · Answer 2 · answered Oct 27 '12 at 08:39

val all = Future.traverse(urls){ url =>
  val f = future(download url) /*(downloadContext)*/
  f.onComplete(display)(displayContext)
  f
}
Await.result(all, ...)

Use scala.concurrent.Future in 2.10, which is RC now.
which uses an implicit ExecutionContext
The new Future doc is explicit that onComplete (and foreach) may evaluate immediately if the value is available. The old actors Future does the same thing. Depending on what your requirement is for display, you can supply a suitable ExecutionContext (for instance, a single thread executor). If you just want the main thread to wait for loading to complete, traverse gives you a future to await on.

score 3 · Answer 3 · answered Oct 27 '12 at 06:57

3

Yes, seems fine to me, but you may want to investigate more powerful twitter-util or Akka Future APIs (Scala 2.10 will have a new Future library in this style).
It uses a thread pool.
No, it won't. You need to use the standard mechanism of your GUI toolkit for this (SwingUtilities.invokeLater for Swing or Display.asyncExec for SWT). E.g.
```
fimages.foreach (_.foreach(im => SwingUtilities.invokeLater(new Runnable { display im })))
```

answered Oct 27 '12 at 06:57

Alexey Romanov

167,066
35
309
487

Thanks for the answer, I'm happy to know my approach is sensible! I'm actually trying out Scala for Android, so this'll come in handy, compared to the horrendous Java syntax! – F.X. Oct 27 '12 at 07:06
Regarding #3, I was thinking and trying out a few simple test cases right before you wrote your answer, and it seems that it _does_ execute on the main thread. I just created a simple `future{"test"}` and ran `foreach(s => println(Thread.currentThread.getName())` on it, which printed `main`. Am I misunderstanding something? – F.X. Oct 27 '12 at 07:08
@F.X. I just did the same twice in the Scala console (for the same future) and got `Thread-15` and `Thread-16`. It may depend on Scala version. – Alexey Romanov Oct 27 '12 at 07:23
I think the Scala console spawns threads for each command you type. I just tried `println(...getName()); f.foreach(s => ...getName())` (in one line) and got two times `Thread-20`. Weird. – F.X. Oct 27 '12 at 07:30
Yes, it seems so. At the very least, since the docs don't say it's called in the main thread, I wouldn't assume so. – Alexey Romanov Oct 27 '12 at 07:34
I'm going to ask on the Scala mailing list, in case they know. It would certainly make my code cleaner and easier to read! – F.X. Oct 27 '12 at 07:37
@F.X. I am trying Futures on Android and they seem to execute on main thread disregarding execution context like you said above. Were you able to understand why? – Aleyna Jan 04 '14 at 20:18
@Aleyna No, I wasn't, but as Alexey said, I wouldn't assume that it is so all the time. It may have to do with the fact that `future{"test"}` completes near-instantly, and the later completion blocks just retrieve the result, hence no use for a secondary thread. Longer-running futures may run their completion block in the worker itself, iirc the docs say nothing about this. – F.X. Jan 07 '14 at 08:32

Asynchronous IO in Scala with futures

3 Answers3

Linked