5

I'm trying to make sense of the blocking construct. While it's not entirely clear how it internally works, the general idea I got was that as long as I used Scala's global thread-pool, wrapping my code with a blocking context would make sure the thread-pool would create extra space for this job (as it's not CPU bound).

  (1 to 1000).foreach { i =>
    Future {
      println(i)
      Thread.sleep(100 * 1000)
    }
  }

will quickly show that only 8 jobs can simultaneously run, while

  (1 to 1000).foreach { i =>
    Future {
        blocking {
            println(i)
            Thread.sleep(100 * 1000)
        }
    }
  }

will show that now we have around 250 simultaneous jobs. Wow! What then caught me off-guard was that

  (1 to 1000).foreach { i =>
    Future {
      println(i)
      Thread.sleep(100 * 1000)
    }
  }

  ('a' to 'z').foreach { c =>
    Future {
        blocking {
            println(c)
            Thread.sleep(100 * 1000)
        }
    }
  }

will again only show 8 simultaneous jobs -- the blocking jobs won't get executed right away.

Why is this? What really are the internal mechanics of the blocking context?

devoured elysium
  • 101,373
  • 131
  • 340
  • 557
  • The general idea is that if you want a long-running asynchronous process you use a Task library or Actors, and use `Future` for short, non-blocking operations. – Tim Mar 27 '19 at 09:09

2 Answers2

4

blocking only takes effect once you've entered the blocking context. Since you have 8 non-blocking futures running, it won't start any new futures, so they can't enter the blocking context. In other words, Scala doesn't "know" they're blocking until they start being executed.

You can think of the second snippet as working like this:

  1. The first future is created and started.
  2. The first future signals that it is blocking via a call to blocking, so the implementation makes room for more futures.
  3. Meanwhile, on the main thread, the second future is created and started.
  4. ...

Whereas your last snippet works like this:

  1. The first future is created and started. It does not signal that it is blocking.
  2. The second future does the same.
  3. ...
  4. The 9th future is created, but not started, as there are 8 non-blocking futures.
  5. ...
  6. The 1001st future (the first one in the second loop) is created, but not started, as there are 8 non-blocking futures. Since it is not started, it never has the chance to tell the implementation that it is blocking by calling blocking.
Brian McCutchon
  • 8,354
  • 3
  • 33
  • 45
  • 1
    7. When the first `blocking` `Future` finally runs, all 26 will run at the same time. – Tim Mar 27 '19 at 08:07
  • Hi, thanks for the reply! Your explanation makes sense. Unfortunately then it makes the `blocking {}` contexts a bit useless -- if I have an already clogged thread-pool then I'm stuck. I think I'm better off having two thread-pools, an unbounded IO thread-pool (or bounded at very high numbers) and a fixed one for CPU-bound tasks. – devoured elysium Mar 27 '19 at 08:55
  • @devouredelysium The real problem is that `Future` is designed for short, non-blocking operations. If you want a "thread pool" then look at a Task library or Actors. – Tim Mar 27 '19 at 09:11
  • @Tim I'm lost. Futures seem to be designed for any sort of async operations (even in Akka if you want to block, you're expected to use Futures!). – devoured elysium Mar 27 '19 at 11:16
  • @devouredelysium Akka doesn't want you to block (`Await.result`) but instead use an `Actor` or `Future`. What are you doing that requires blocking? – Tim Mar 27 '19 at 13:39
  • @Tim calling a remote service call, for instance. – devoured elysium Mar 27 '19 at 14:32
  • @devouredelysium The result of a remote service call would typically be wrapped in a `Future` so that is already asynchronous. If you post some sample code in a new question we should be able to help with avoiding blocking. – Tim Mar 27 '19 at 14:42
  • @Tim: if I execute a remote call that requires me to pass in a execution-context, I have to decide what execution context I want to give it. That's the whole reasoning of the OP. – devoured elysium Mar 27 '19 at 15:55
  • @devoured If you use `blocking` for all blocking calls, the problem in the OP should be much less pronounced. – Brian McCutchon Mar 27 '19 at 16:28
  • @BrianMcCutchon but then if the thread-pool already has cpu-bound threads running, it may so happen that no IO-bound future gets processed for a while. Am I correct? – devoured elysium Mar 27 '19 at 16:31
  • @devouredelysium If you want to do something CPU-bound on a shared pool, you should either wrap it in `blocking` or divide the task into several asynchronous steps so you allow other tasks to progress. It's cooperative task scheduling since the worker thread cannot be preempted to run *other* tasks. – Viktor Klang Mar 27 '19 at 19:37
  • @ViktorKlang you mean CPU-bound or IO-bound? – devoured elysium Mar 27 '19 at 19:52
  • @devouredelysium In practice it applies to both, because in both situations other tasks are *blocked* from making progress. So view `blocking`-blocks as "blocking others from making progress for a non-insignificant amount of time". – Viktor Klang Mar 27 '19 at 21:29
  • 1
    @devouredelysium In the CPU-bound case, adding more threads can be problematic because it can reduce the performance of the CPU-bound workload by creating more threads which will be allocated time slices. So therefor `blocking` is typically only used for IO-bound or Waiting sections, whereas for CPU-bound things it is typically better to divide the work up into multiple async steps. – Viktor Klang Mar 27 '19 at 21:31
  • 1
    My question is actually quite simple: I have a big, multi-threaded (server) application that will spawn loads of asynchronous tasks. Most of them will involve some external call (IO). Should I use a fixed-thread-pool for CPU (with computer's core count) + an unbounded cached-thread-pool for IO calls or should I always use Scala's global, making use of the `blocking` context? – devoured elysium Mar 27 '19 at 22:09
  • I guess the question is if you can make those IO external calls non-blocking. That'd be the best option. – Viktor Klang Mar 28 '19 at 12:56
  • @ViktorKlang Looks like that's also what's being asked in https://stackoverflow.com/questions/38014748/best-executioncontext-for-io and https://stackoverflow.com/questions/23940519/default-executioncontext-with-blocking-calls -- what to do when your multithreaded server application has to call legacy blocking IO APIs (when there is no non-blocking alternative) – Daniel Silva Apr 19 '19 at 22:09
  • It seems that the comment and the two other questions seem to boil down to when to use scala.concurrent.blocking and when to use a second ExecutionContext dedicated to legacy blocking IO – Daniel Silva Apr 19 '19 at 22:13
0

Your first loop starts 8 threads, and blocks because it needs to start 992 more futures before completing.

Not sure what it is speicifcally, that "caught you off guard". Once the first foreach call completes, it'll move on to the second one, and start 26 more then.

Dima
  • 39,570
  • 6
  • 44
  • 70
  • Hi. Please re-read what I wrote. If I run the 2nd code snippet, it will create *two hundred and fifty* simultaneously running threads. It doesn't do it in the final code snippet. – devoured elysium Mar 26 '19 at 22:52
  • Please reread what _I wrote_ :) It doesn't do it in the final snippet, because the final snippet _starts_ with the first snippet ... which starts 8 threads and waits for them to complete before it can continue. – Dima Mar 26 '19 at 23:54