We have an API implemented in bare bones Scala Akka HTTP - a couple of routes fronting for a heavy computation (CPU and memory intensive). No clustering - all running on one beefy machine. The computation is proper heavy - can take more than 60s to complete for one isolated request. And we don't care about the speed that much. There's no blocking IO, just lots of CPU processing.
When I started performance testing the thing, an interesting pattern showed: say requests A1, A2, ..., A10 come through. They use resources quite heavily and it turns out that Akka will return HTTP 503 for requests A5-A10 that overran. The problem is that that computation is still running even though there's no one there to pick up the result.
And from there we see a cascading performance collapse: requests A11-A20 arrive to a server still working on requests A5-A10. Clearly these new requests also have a chance of overrunning - even higher given that the server is busier. So some of them will be running by the time Akka triggered a timeout, making the server even busier and slower and then the new batch of requests comes through... so after running the system for a bit you see that nearly all requests after certain point start failing with timeouts. And after you stop the load you see in logs some requests still being worked on.
I've tried running the computation in a separate ExecutionContext as well as the system dispatcher, trying to make it fully asynchronous (via Future composition), but the result is still the same. Lingering jobs make server so busy that eventually almost every request fails.
A similar case is described in https://github.com/zcox/spray-blocking-test but the focus is shifted there - /ping
doesn't matter for us as much as more or less stable responsibility on endpoint that handles long running requests.
The question: how do I design my application to be better at interrupting hanging requests? I can tolerate some small percentage of failed requests under heavy load, but grinding the entire system to a halt after several seconds is unacceptable.