GAE scaling rules

Question

This is not the same as: How does Google App Engine Autoscaling work?.

TLDR; I have a GAE app that has an endpoint that has to wait a while before it returns. Something like 15 seconds. It's not working particularly hard while it waits. I dont really have a way to implement poling or callbacks instead of the long wait because of third -party integrations.

I'm afraid that long wait times will make Google assume that I'm under heavy load, thus new gae instance will get spawned for nothing, and requests might end up sitting in a queue wen I have instances ready to take on new work.

The docs tell me "The decision [of whether to spin up a new instance or not, and perhaps whether to send an incoming request to a queue or an instance] takes into account the number of available instances, how quickly your application has been serving requests (its latency), and how long it takes to spin up a new instance".

I want GAE to expect my application to respond slowly, but do everything else as normal.

I'm just struggling to find a straight answer in google's documentation.

PS: this wont work I think:

max_pending_latency: The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. The default value is "30ms".

reason being: I'm talking about the latency between the request arriving at my view code and my view code returning. If I allow a 15 second wait in the queue then the client request might be bumped to a queue and then jst sit there while GAE thinks my app is under heavy load (but isnt) then GAE would finally send the request to my long waiting view which will wait a long time to return anything, thus causing GAE to freak out and put more things in the queue and try to spawn up more instances anyway.

Here is what I believe is happening (vastly simplified)

a request arrives
gae checks if there is an instance that can handle it. Yes, it goes to instanceA (which now chugs along for a while)
after 10 seconds another request arrives
GAE puts it in a queue and checks if instanceA can handle it
instanceA seems to be under heavy load. GAE wont just send the request to instanceA
the request will sit in the queue for a while
at some point in time the request will go to an instance. Either:
- instanceA finally returns and the load estimate is now lower so instanceA gets it
- instanceB is created and fed the new request

Here is what I want to happen:

a request arrives
gae checks if there is an instance that can handle it. Yes, it goes to instanceA (which now chugs along for a while)
after 10 seconds another request arrives
GAE puts it in a queue and checks if instanceA can handle it
instanceA isn't terribly busy so the request goes there
I feel the warm glow of a job well done

As far as I can tell the only way I can get this to work without spawning a million instances and having long queue times is to not use GAE

I'm guessing they don't give a straight answer because 1. the logic is probably very complex and not easily expressed on paper and 2. if they did write down an actual formula, it's likely to be incorrect at the next release :-). I know that isn't super helpful, but ... — mgilson, Feb 01 '17 at 14:52
@DanCornilescu I think this question is not a duplicate of that very general question. OP has a case that I think we can answer/address with specific insights and, like I tried below, a specific configuration. — Zach Young, Feb 01 '17 at 18:25

Zach Young · Answer 1 · 2017-02-02T20:41:34.143

0

I think you want to play with max_pending_latency and min_pending_latency, and in particular try increasing them up to, and past, the nominal 15-second threshold you mentioned. From app.yaml Reference / Scaling Elements:

max_pending_latency: The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. The default value is "30ms".

A high maximum means users might wait longer for their requests to be served (if there are pending requests and no idle instances to serve them), but your application will cost less to run.

min_pending_latency: The minimum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it.

A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases the time users must wait for their requests to be served.

So, maybe something like:

application: simple-sample
module: my_module
version: uno
runtime: python27
api_version: 1
instance_class: F1
automatic_scaling:
  min_idle_instances: 0  # Don't even know if 0 is valid, but should be least costly
  max_idle_instances: 0
  min_pending_latency: 30000ms
  max_pending_latency: 30000ms
  max_concurrent_requests: 80  # Maximum value: you want one instance to handle as much traffic as possible

edited Feb 02 '17 at 20:41

answered Feb 01 '17 at 17:57

Zach Young

10,137
4
32
53

I dont think that's the one... I dont want stuff sitting in a queue when it should be given to an instance. I added soe stuff to my question – Sheena Feb 02 '17 at 04:47
These configs are about "starting a new instance to handle it" not "should be given to an instance". If there are idle instances the requests will go to them, no issue there. – Dan Cornilescu Feb 02 '17 at 14:31
I would significantly bump `max_concurrent_requests`, tho - the app should be able to start handling (many) requests while earlier ones are still being processed. I'd also try to make it threadsafe. – Dan Cornilescu Feb 02 '17 at 15:39
@DanCornilescu Yes, I was thinking the same: OP wants any one instance to handle as much traffic as possible, without GAE thinking a new instance is needed. Updated config and added comment. – Zach Young Feb 02 '17 at 20:40
@DanCornilescu but if an instance looks very busy (because it's taking a long time to respond, thus google estimates that it is under heavy load) then it wont get assigned anything because as far as google knows it is not idle. Even if it's just chilling out and waiting for another service, gae will think it cant accept further requests. Even if it'as allowed lots of connections, if an instance looks like it has ground to a halt then surely gae would want to send the next request somewhere else? I edited my question again... am I talking sense? – Sheena Feb 03 '17 at 11:13

GAE scaling rules

1 Answers1