0

Some weeks ago my app on App Engine just started to increase the number of idle instances to an unreasonable high amount, even when there is close to zero traffic. This of course impacts my bill which is skyrocketing.

My app is simple Node.js application serving a GraphQL API that connects to my CloudSQL database.

Why are all these idle instances being started?

My app.yaml:

runtime: nodejs12
service: default

handlers:
  - url: /.*
    script: auto
    secure: always
    redirect_http_response_code: 301

automatic_scaling:
  max_idle_instances: 1

Screenshot of monitoring:

enter image description here

enter image description here

jln-dk
  • 294
  • 1
  • 2
  • 11

3 Answers3

3

This is very strange behavior, as per the documentation it should only temporarily exceed the max_idle_instances.

Note: When settling back to normal levels after a load spike, the number of idle instances can temporarily exceed your specified maximum. However, you will not be charged for more instances than the maximum number you've specified.

Some possible solutions:

  1. Confirm in the console that the actual app.yaml configuration is the same as in the app engine console.

  2. Set min_idle_instances to 1 and max_idle_instances to 2 (temporarily) and redeploy the application. It could be that there is just something wrong on the scaling side, and redeploying the application could solve this.

  3. Check your logging (filter app engine) if there is any problem in shutting down the idle instances.

  4. Finally, you could tweak settings like max_pending_latency. I have seen people build applications that take 2-3 seconds to start up, while the default is 30ms before another instance is being spun up.

  5. This post suggests setting the following, which you could try:

    instance_class: F1
    automatic_scaling:
      max_idle_instances: 1  # default value
      min_pending_latency: automatic  # default value
      max_pending_latency: 30ms
    
  6. Switch to basic_scaling, let Google determine the best scaling algorithm (last resort option). This would look something like this:

    basic_scaling:
      max_instances: 5
      idle_timeout: 15m
    

The solution could of course also be a combination of 2 and 4.

Cloudkollektiv
  • 11,852
  • 3
  • 44
  • 71
  • Thank you for the suggestions. I have now deployed a new version with your solution #2 ("min" and "max" idle instances). I can confirm from the App Engine console that the config used is what I expected (read: it has "min" and "max" idle instances set). I'll wait and see if that has an impact. – jln-dk Oct 15 '20 at 10:05
  • Update after 1 hour: No change in idle instances. Still the same as on the screenshots. – jln-dk Oct 15 '20 at 11:38
  • I would suggest you try tweaking the other options like max_pending_latency. If that does not work as expected try basic_scaling. Otherwise I would suggest submit a support ticket. – Cloudkollektiv Oct 15 '20 at 11:44
  • I've set the "max_pending_latency" to "10s" now. I'll see how it goes. – jln-dk Oct 15 '20 at 14:13
  • Added some extra clarification, let me know if anything works! – Cloudkollektiv Oct 15 '20 at 14:37
  • Thank you very much for your help, @Nebulastic! Really appreciated! – jln-dk Oct 16 '20 at 06:56
1

Update after 24 hours:

I followed @Nebulastic suggestions, number 2 and 4, but it did not make any difference. So in frustration I disabled the entire Google App Engine (App Engine > Settings > Disable application) and left it off for 10 minutes and confirmed in the monitoring dashboard that everything was dead (sorry, users!).

After 10 minutes I enabled App Engine again and it booted only 1 instance. I've been monitoring it closely since and it seems (finally) to be good now. And now after the restart it also adheres to the "min" and "max" idle instances configuration - the suggestion from @Nebulastic. Thanks!

Screenshots:

enter image description here

enter image description here

jln-dk
  • 294
  • 1
  • 2
  • 11
  • What did the logging say with this filter? resource.type="gae_app" resource.labels.module_id="default" severity=(EMERGENCY OR ALERT OR CRITICAL OR ERROR OR WARNING) – Cloudkollektiv Oct 16 '20 at 07:09
  • Nothing. No logs at all for that query. Also tried without module_id, still nothing. – jln-dk Oct 16 '20 at 11:05
0

Have you checked to make sure you dont have a bunch of old versions still running? https://console.cloud.google.com/appengine/versions

check for each service in the services dropdown

Alex
  • 5,141
  • 12
  • 26
  • Thank you for your suggestion. But yes, eventually I even deleted all old versions so only 1 version (the current) was shown on the list. But no difference. – jln-dk Oct 16 '20 at 06:21