Akka HTTP and long running requests

Question

We have an API implemented in bare bones Scala Akka HTTP - a couple of routes fronting for a heavy computation (CPU and memory intensive). No clustering - all running on one beefy machine. The computation is proper heavy - can take more than 60s to complete for one isolated request. And we don't care about the speed that much. There's no blocking IO, just lots of CPU processing.

When I started performance testing the thing, an interesting pattern showed: say requests A1, A2, ..., A10 come through. They use resources quite heavily and it turns out that Akka will return HTTP 503 for requests A5-A10 that overran. The problem is that that computation is still running even though there's no one there to pick up the result.

And from there we see a cascading performance collapse: requests A11-A20 arrive to a server still working on requests A5-A10. Clearly these new requests also have a chance of overrunning - even higher given that the server is busier. So some of them will be running by the time Akka triggered a timeout, making the server even busier and slower and then the new batch of requests comes through... so after running the system for a bit you see that nearly all requests after certain point start failing with timeouts. And after you stop the load you see in logs some requests still being worked on.

I've tried running the computation in a separate ExecutionContext as well as the system dispatcher, trying to make it fully asynchronous (via Future composition), but the result is still the same. Lingering jobs make server so busy that eventually almost every request fails.

A similar case is described in https://github.com/zcox/spray-blocking-test but the focus is shifted there - /ping doesn't matter for us as much as more or less stable responsibility on endpoint that handles long running requests.

The question: how do I design my application to be better at interrupting hanging requests? I can tolerate some small percentage of failed requests under heavy load, but grinding the entire system to a halt after several seconds is unacceptable.

Pretty broad question. In short, you should deny some requests immediately under heavy load (just say sorry or track users who run too many computations, or maintain a queue). You also should use separate contexts for your computation, database, spray... so your http front-end layer would always response. — Nikita, Nov 24 '16 at 15:38

score 1 · Answer 1 · edited May 23 '17 at 12:07

Akka HTTP does not automatically terminate processing for requests which have timed out. Usually the extra bookkeeping which would be needed to do that would not pay off, so it's not on by default. I think it's something of an oversight, TBH, and I've had similar problems with Akka HTTP myself.

I think you need to manually abort the processing on request timeout, otherwise the server will not recover when it is overloaded, as you have seen.

There isn't a standard mechanism with which you can implement this (see "How to cancel Future in Scala?"). If the thread is doing CPU work with no i/o, then Thread.interrupt() will not be useful. Instead you should create a Deadline or Promise or similar that shows if the request is still open, and pass that around and periodically check for timeout during your computation:

// in the HTTP server class:
val responseTimeout: Duration = 30.seconds

val routes = 
  path("slowComputation") {
    complete {
      val responseTimeoutDeadline: Deadline = responseTimeout.fromNow
      computeSlowResult(responseTimeoutDeadline)
    }
  }

// in the processing code:
def computeSlowResult(responseDeadline: Deadline): Future[HttpResponse] = Future {
  val gatherInputs: List[_] = ???
  gatherInputs.fold(0) { (acc, next) =>

    // check if the response has timed out
    if (responseDeadline.isOverdue())
      throw new TimeoutException()

    acc + next // proceed with the calculation a little
  }
}

(Checking if a Promise has been completed will be a lot cheaper than checking whether a Deadline has expired. I've put the code for the latter above, as it's easier to write.)

Gave this a go - seems to have improved things a bit, at least server doesn't collapse completely. Thankfully on this endpoint we had a more or less sequential process so I injected deadline testing into functions passed to `Future#flatMap`. Wonder what the solution might be for a more general case, when there's no traversable sequence of inputs/steps? — Anton, Dec 05 '16 at 14:22
"a more general case, when there's no traversable sequence of inputs/steps" -- I think this is the general solution. There will always be a link between an operation and the HTTP request that is waiting for it to complete, otherwise how would the output get to the client? — Rich, Dec 05 '16 at 17:08

Abhijit Sarkar · Answer 2 · 2017-01-16T23:24:59.957

The spray-blocking-test uses libraries that I don't think exist in Akka HTTP. I'd a similar problem and I solved it as follows:

application.conf

blocking-io-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    fixed-pool-size = 16
  }
  throughput = 1
}

Route

complete {
  Try(new URL(url)) match {
    case scala.util.Success(u) => {
      val src = Source.fromIterator(() => parseMovies(u).iterator)

      src
        .via(findMovieByTitleAndYear)
        .via(persistMovies)
        .completionTimeout(5.seconds)
        .toMat(Sink.fold(Future(0))((acc, elem) => Applicative[Future].map2(acc, elem)(_ + _)))(Keep.right)
        // run the whole graph on a separate dispatcher
        .withAttributes(ActorAttributes.dispatcher("blocking-io-dispatcher"))
        .run.flatten
        .onComplete {
            _ match {
               case scala.util.Success(n) => logger.info(s"Created $n movies")
               case Failure(t) => logger.error(t, "Failed to process movies")
            }
        }

      Accepted
    }
    case Failure(t) => logger.error(t, "Bad URL"); BadRequest -> "Bad URL"
  }
}

The response returns immediately while the processing keeps happening in the background.

Additional reading:

http://doc.akka.io/docs/akka/current/scala/dispatchers.html http://blog.akka.io/streams/2016/07/06/threading-and-concurrency-in-akka-streams-explained

If I understand correctly, in your use case you don't need to return the result of the computation in the HTTP response? — Anton, Jan 16 '17 at 10:01
@Anton Correct. If the result was needed, I'd be forced to wait. — Abhijit Sarkar, Jan 16 '17 at 11:25
I think our problems are orthogonal then - executing the task in the background is fairly simple with a separate dispatcher, the problem is in stopping them when they're no longer needed - if they have no effect and only return the value to the requestor. My problem is requestor giving up after the timeout and server still running — Anton, Jan 16 '17 at 17:47
@Anton I updated my answer to show a timeout as well. You can do a timeout on the whole graph before materializing it, or on each stage. Where it gets hairy is that "canceling" a process depends on the system that's running it. If you make a DB request, some drivers will let you issue a cancel request, but it's up to the DB to honor it. — Abhijit Sarkar, Jan 16 '17 at 20:27

Akka HTTP and long running requests

2 Answers2