This is not the same as: How does Google App Engine Autoscaling work?.
TLDR; I have a GAE app that has an endpoint that has to wait a while before it returns. Something like 15 seconds. It's not working particularly hard while it waits. I dont really have a way to implement poling or callbacks instead of the long wait because of third -party integrations.
I'm afraid that long wait times will make Google assume that I'm under heavy load, thus new gae instance will get spawned for nothing, and requests might end up sitting in a queue wen I have instances ready to take on new work.
The docs tell me "The decision [of whether to spin up a new instance or not, and perhaps whether to send an incoming request to a queue or an instance] takes into account the number of available instances, how quickly your application has been serving requests (its latency), and how long it takes to spin up a new instance".
I want GAE to expect my application to respond slowly, but do everything else as normal.
I'm just struggling to find a straight answer in google's documentation.
PS: this wont work I think:
max_pending_latency: The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. The default value is "30ms".
reason being: I'm talking about the latency between the request arriving at my view code and my view code returning. If I allow a 15 second wait in the queue then the client request might be bumped to a queue and then jst sit there while GAE thinks my app is under heavy load (but isnt) then GAE would finally send the request to my long waiting view which will wait a long time to return anything, thus causing GAE to freak out and put more things in the queue and try to spawn up more instances anyway.
Here is what I believe is happening (vastly simplified)
- a request arrives
- gae checks if there is an instance that can handle it. Yes, it goes to instanceA (which now chugs along for a while)
- after 10 seconds another request arrives
- GAE puts it in a queue and checks if instanceA can handle it
- instanceA seems to be under heavy load. GAE wont just send the request to instanceA
- the request will sit in the queue for a while
- at some point in time the request will go to an instance. Either:
- instanceA finally returns and the load estimate is now lower so instanceA gets it
- instanceB is created and fed the new request
Here is what I want to happen:
- a request arrives
- gae checks if there is an instance that can handle it. Yes, it goes to instanceA (which now chugs along for a while)
- after 10 seconds another request arrives
- GAE puts it in a queue and checks if instanceA can handle it
- instanceA isn't terribly busy so the request goes there
- I feel the warm glow of a job well done
As far as I can tell the only way I can get this to work without spawning a million instances and having long queue times is to not use GAE