2

We have some strange behaviors in our distributed apps. We didn't yet found why but we think it might be related to OutOfMemory errors.

However we try to follow good coding practice regarding fatal errors, such as never catching all throwable, and at max NonFatal ones. But I realized that there is something I didn't quite understand about fatal error happening in Future, and our code is pretty much all wrapped into Future at some point.

Here is a minimal example:

val f = Future{ /* some code that makes an OOM error */ }
val result = Await.result(f, 1 minutes)

What happens is

  • the thread running the future code fails. The OOM is printed in stderr. But we don't see it cuz the app is deployed "somewhere" and we didn't redirect the stderr
  • however, the Future doesn't end (it catches only NonFatal). Worth it doesn't free resources.
  • after 1 minutes we get TimeoutException with no relation to the OOM. Hopefully, it releases resources. But we have lost time, other thread might be affected. And we then process it as a future that didn't have time to finish. Similarly as if some DB access didn't respond in time, i.e. we'll typically try again.

I found a good description of the issue here: https://github.com/scala/bug/issues/9554

My question: how should we handle fatal error happening in future?

  • at least, the whole app should fail like it would if a fatal error happens in the main thread. Maybe with a core dump
  • at best, have a proper management to: log the exception, apply a suitable re-execution pattern, maybe kill gracefully other running future/thread, ...

Note: this is a similar problem than Exception causes Future to never complete but the answer is "this is intended" not how to manage it

Juh_
  • 14,628
  • 8
  • 59
  • 92
  • Have you considered this solution? https://stackoverflow.com/a/3878199/227803 – Viktor Klang Aug 23 '18 at 12:04
  • Nope. Actually it is the same problem in java and this solution can be applied here (and this is a valid answer for my question). OOM is my primary concern, but do you know if it can be applied to other fatal errors? – Juh_ Aug 23 '18 at 12:08
  • 1
    Try Scala 2.13.0-M5 once that is released, I've reworked the handling of fatal errors, but I don't know how it will affect your specific code. – Viktor Klang Aug 23 '18 at 13:06
  • Thanks, I'll look into it when it comes out – Juh_ Aug 23 '18 at 14:07

1 Answers1

1

I found a way, which requires to use our own ExecutionContext with an Executor containing our own UncaughtExceptionHandler:

// UncaughtExceptionHandler that process Fatal error
val exceptionHandler = new Thread.UncaughtExceptionHandler {
  override def uncaughtException(t: Thread, err: Throwable): Unit =  err match {
    case NonFatal(_) =>
      // don't process NonFatal which should be managed by the processing code

    case t =>
      // process fatal error
      log.error("FATAL ERROR", t)
      System.exit(1)       
  }
}


// make the Execution context with our UncaughtExceptionHandler
//   here I chose a ForkJoinPool which is the type of EC.global
//   I hard-coded the thread number to 8. global use the number of available core
val fjp = new ForkJoinPool(8, ForkJoinPool.defaultForkJoinWorkerThreadFactory, exceptionHandler, false)
implicit val ec = ExecutionContext.fromExecutor(fjp)


// used implicitly in the future code
val f = Future{ /* some code that makes an OOM error */ }
val result = Await.result(f, 1 minutes)
Juh_
  • 14,628
  • 8
  • 59
  • 92
  • `case t` will catch probably to mutch exceptions. You don't want Controlle exceptions to kill your jvm. use `-XX:+ExitOnOutOfMemoryError` instead – crak Aug 23 '18 at 13:47
  • yup, here the error management is very simple to raise my case (and fit my first requirement "at least, the whole app should fail"). This is probably not what we want to do, or not for all fatalError. – Juh_ Aug 23 '18 at 13:53