App engine 499 HTTP response for long running tasks

Question

We have a REST API written in Spring Boot (Java 8) and hosted on App engine Flexible environment.

Currently app.yaml looks like:

runtime: java
api_version: '1.0'
env: flex
threadsafe: true
manual_scaling:
  instances: 1
resources:
  cpu: 4
  memory_gb: 16
liveness_check:
  path: "/healthcheck"
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 2
  success_threshold: 2
  initial_delay_sec: 300

We have noticed that long running (1-5min) requests for one of the endpoints usually returns 499 HTTP response code, which is not something we expected.

It looked something like this: POST /endpoint -> (work starts in application) -> 499 response is sent back -> (work still runs in application thread) -> Caller repeats request -> we have 2 same requests running -> repeats

To avoid that, we have moved endpoint work into background ThreadPoolTaskExecutor and that endpoint is no longer causing any problems.

But other issue appeared - other endpoint /second now is getting same early 499 response issue even if it never did that before (usual run-time for that endpoint is from 1 second when cached up to 3 minutes when not)

This makes me believe that App Engine scheduler/load balancer is somehow deciding on how long requests can take before they get aborted. Are we missing some kind of timeout configuration? Anyone can identify where the problem could be (tomcat, gce, spring, ...)?

score 0 · Answer 1 · answered Jul 30 '18 at 14:26

The error code is described here:

The operation was cancelled, typically by the caller.

HTTP Mapping: 499 Client Closed Request

I suspect that the issue is the following:

POST /endpoint -> (work starts in application) -> Client disconnects -> **499 is logged **, no response is sent back -> (work still runs in application thread) -> Caller repeats request -> we have 2 same requests running -> repeats

This might be because the request takes too long and the client triggers a refresh or repeats the request. Check this post discussing the same error. There might not be anything wrong with the code or GCP infrastructure.

There is a 60 minutes limit for request duration in App Engine Flexible, so this is not he issue. Neither the Load Balancer, if you are not using SSL connections, because it wouldn't participate.

I am not able to identify where the problem could be. Can you share more information on the error or the long running request to see if it shares some light?

We have found this issue to be present when using Postman to create requests, and Spring Boot Rest Template also receives this timeout. I believe that our requests are routed using SSL, do you have any resources that would indicate if load balancer has any limits? — Valdas, Aug 16 '18 at 07:26

App engine 499 HTTP response for long running tasks

1 Answers1