I have an App Engine flex app that takes requests for some background computations and puts them in a task queue for processing. Requests are sent at a fairly constant rate from another process. After a fresh deploy, requests are processed quite quickly (ms), but then latency quickly increases to seconds, then minutes before becoming completely clogged. I notice in Cloud Tasks that there are tasks running when there are no tasks in the queue. These seem to be using up instance resources and stay stuck for hours, well beyond any timeout. Once my instances get clogged up with these tasks, my other process can't seem to make requests without timing out, even with a very high timeout. Using auto-scaling, I thought App Engine was supposed to spin up more tasks to handle incoming requests (source).
The task handlers are not terribly complicated. They just do some operations on a Google Spanner database and read/write to/from GCS (IO intensive).
App configuration:
runtime: python
env: flex
service: pipeline
entrypoint: gunicorn -b :$PORT main:app --timeout 300
threadsafe: true
runtime_config:
python_version: 3
Queue configuration:
app_engine_http_queue {
}
rate_limits {
max_dispatches_per_second: 500.0
max_burst_size: 100
max_concurrent_dispatches: 1000
}
retry_config {
max_attempts: 100
min_backoff {
nanos: 100000000
}
max_backoff {
seconds: 3600
}
max_doublings: 16
}
state: RUNNING