Cloud Run - Slow responses time to time

Question

I am using Cloud Run instance for serving my FastAPI project that includes 2 submodules that are C calculation models that are not too heavy performance wise.

I am now load testing the main endpoint with my own python script which is basically a function that makes a requests constantly and measure the response time. I run the same function on multiple threads so I can simulate real environment with users.

I want to note that I experimented with all the different options and resources available for Cloud Run, more vCPUs, more memory, allocating the cpus ect.

The issue I have is that when I run constant multi-thread requests to the endpoint for sometime, let's say a minute at some point I am getting a few (3-4) ~20 seconds response times which is slow. Normally, all the requests are proceeded in a matter of 1-4 seconds. I tried configuring the concurrency of the requests, the active instances and so on.

Does someone have any idea, why this may be happening?

I want to note that there is nothing much happening on the logs on those slow responses. Additionally, I do not have peaks on the metrics tab.

You might find [this answer](https://stackoverflow.com/a/71517830/17865804) helpful — Chris, Jun 21 '23 at 09:59
This could be due to [cold start](https://github.com/ahmetb/cloud-run-faq#does-cloud-run-have-cold-starts). Have you tried setting `minimum-instances` ? — Roopa M, Jun 21 '23 at 10:19
I don't know so much fastAPI and python, but I know that python is, by design, single threaded. My first guess is the following: event if you have 2 or more CPUs, the requests are all processed by the same vCPU, and the other do nothing. You should be able to validate that if you have a look at the CPU graph. If you have 2 vCPU, you should see 50% of usage. 4 CPU, 25%,... It's also consistent with the metrics (4 request of 4 seconds + context switching = 20s). If it's the case, you should use a dispatcher that receives the request and spawn a new thread for each. Gunicorn is used with Flask. — guillaume blaquiere, Jun 21 '23 at 19:45

Cloud Run - Slow responses time to time

0 Answers0