Cloud Run - Requests latency

Question

I am trying to use Cloud Run to run a microservice connected to Firestore. The microservice creates objects based on s2geometry to create multiple geographical zones with specific attributes and thus help localizing users to send them information according to the zone I locate them in.

I used Python 3.7 and FastAPI to create the microservice and the routes to communicate with it.

The microservice runs smoothly on my local machine and on Compute Engines as most of my routes takes less than 150 ms to answer when I test them. However I have a latency issue when I deploy it with Cloud Run. From time to time the microservice takes a really long time to answer (up to 15 mins) and I can't pin point what exactly causes it.

Here is a screen shot where we can see the Request Count and the Request Latency :

Request Count and Request Latency

There are no real correlations between the requests latency and the number of requests or at least no trivial ones. I also looked at the memory usage of the service and the memory usage is at 30% at most. The CPU usage however some times hit 100% but not necessarily when requests are slow.

Finally when I explored the Trace List and compared requests that have high latency I noticed the following difference

Trace of slow request
Trace of fast request

Fast requests seem to call themselves whereas slow requests don't and I do not know why.

For now we do not really have a lot of users so I thought that it could be a cold start issue but slow requests are not necessarily the first ones.

Now, to be honest I don't know what's going on here and what Cloud Run does (or what I did wrong) and I also find it pretty difficult to find a thorough explanation on how Cloud Run actually works so if you have one (other than the google one) I would gladly dive into it.

Thank your very much for you help

Have you timed how long the s2geometry takes to create the geometries? Also how much space it uses. Since it's a public server, it's probable that you are requesting more resources than what you have. If you are using containers, check the "physical" limits of such. — lsabi, Sep 22 '20 at 09:26
Hey @Isabi The docker image weighs 2.19GB on its own. My initialization step takes about 2 minutes for a total size of 226210319 bytes which is 226 MB (guppy output of hpy.heap). I did not think about it to be honest as I don't think my microservice is that heavy but I'll look into that. Thank you — Luka Barisic, Sep 22 '20 at 09:54
Pay attention: the weight of the docker image could be the DISK SPACE it uses, not the RAM it needs. Try to check both, but usually disk space is not a problem as it throws an error immediately. — lsabi, Sep 22 '20 at 10:17
Thank you for reminding me of that. I checked via docker stats and the it only takes 271 MB of RAM. I'll try monitoring the container stats when deployed on Cloud Run. — Luka Barisic, Sep 22 '20 at 13:49
@Isabi I encourage you to post your comment as an answer to benefit the community. — MrTech, Sep 24 '20 at 17:41

Luka Barisic · Accepted Answer · 2021-09-22T08:52:40.600

2

After several tests it seems that it was a cold start issue. Cloud Run containers are stoped after a certain period if they are not being used and as we did not have a lot of traffic the container had to reboot every time a user wanted to access the app.

Solution :

I created a Cloud Function that sends a request to the container when triggered and then created a Cloud Scheduler job that runs the function every minute.

Note :

If different revisions are routed to your service you need to create a Cloud Scheduler job for each of the revision. To do so you have to create a Revision URL (tag) for each of the routed revision (currently beta).

Edit :

Now as @Jofre mentioned, you can chose to always have an instance of your service running by setting the "Minimum number of instances" to 1. If you are using the console GCP even tells you to "Set to 1 to reduce cold starts".

edited Sep 22 '21 at 08:52

answered Sep 25 '20 at 07:27

Luka Barisic

258
3
10

2

You can use `gcloud beta update --min-instances 1` to keep a warm instance and avoid cold starts. That parameter is documented [here](https://cloud.google.com/sdk/gcloud/reference/beta/run/services/update#--min-instances). This will avoid the need for the service to be called periodically by the function. – Jofre Oct 02 '20 at 10:39
1

Now there's even some real documentation explaining how `min-instance` helps with cold starts: https://cloud.google.com/run/docs/configuring/min-instances – Jofre Oct 16 '20 at 09:37
@Jofre Yes, I've noticed that not long ago. Now it's all a question of cost optimizations given that having a container always up has a cost. Anyway thanks a lot ! – Luka Barisic Oct 19 '20 at 13:19

Cloud Run - Requests latency

1 Answers1

Linked