Why is my Google App Engine instance initialization latency so high (50s)?

Question

The following global imports seem to increase the initialization latency from 4s to ~ 50s:

from keras.optimizers import Adam
from keras.models import load_model

This only occurs on the live app engine and not the development preview, which runs main.py almost instantaneously.

I'm on keras==2.2.4 and tensorflow==1.14.0. I checked the logs and the error reporting, and there don't seem to be any errors.

One thing I noticed is in the logs it prints out Using Tensorflow Backend about 8 times, whereas the development preview only says it once. Is it re-importing keras on loop?

I am running on Flask==1.1.2 and instance class F4_1G.

Latency

There's also a 30 second delay between tensorflow setup processes?

Edit #1:

The logs still show multiple Using Tensorflow Backend despite only intializing a single instance. Also, I'm not sure why the memory is at 1.1GB. I don't initialize anything globally besides a few strings.

Edit #2:

Using Tensorflow Backend is printed out 8 times because there are 8 gunicorn workers, I think. I can reduce the workers by scaling back my instance type to one smaller than F4_1G, but then the logs show that I've Exceeded soft memory limit and that I should upgrade my instance type. Not sure why the initialization takes so much memory (1.1 GB).

`Using Tensorflow Backend` is printed only once per process start. Does app engine start multiple python instances? — Stefan Dragnev, May 31 '20 at 21:56
I have automatic scaling enabled, so I suppose it might start multiple instances. I'm not sure how I can monitor the count though. — Allen Chang, May 31 '20 at 22:47
You can use this `gcloud app instances list` to check how many instances are up, here is the [doc](https://cloud.google.com/sdk/gcloud/reference/app/instances/list) for it. This would explain the reason you have so many calls to TensorFlow, but not the latency necessarily though. — Ralemos, Jun 01 '20 at 14:26
The speed on your local machine may be significantly higher than on GAE, see https://stackoverflow.com/a/41390857/4495081. You may want to tweak the autoscaling parameters to account for a longer init time so that it doesn't spawn too many instances (which can be expensive). — Dan Cornilescu, Jun 02 '20 at 02:10
I'm not using my local machine to test; rather, I'm using the GAE "web preview" which is run on the Cloud Shell VM, I think. Still, the Cloud Shell VM takes 1-2 seconds to import the Keras/TF library, whereas GAE takes 50 seconds, on average. I feel like that's miles longer than it should take. — Allen Chang, Jun 02 '20 at 08:06
What `entrypoint` are you using? Are you starting `gunicorn` with multiple workers or threads? You might want to experiment with it's `--preload` option. — Dustin Ingram, Jun 02 '20 at 23:01
I've not specified an entrypoint, so it defaults to gunicorn w/ the # of workers according to my instance type, which is F4_1G (8 workers). I guess that's why it was printing out `Using Tensorflow Backend` 8 times — Allen Chang, Jun 02 '20 at 23:47

Why is my Google App Engine instance initialization latency so high (50s)?

0 Answers0