I am using Cloud Run instance for serving my FastAPI project that includes 2 submodules that are C calculation models that are not too heavy performance wise.
I am now load testing the main endpoint with my own python script which is basically a function that makes a requests constantly and measure the response time. I run the same function on multiple threads so I can simulate real environment with users.
I want to note that I experimented with all the different options and resources available for Cloud Run, more vCPUs, more memory, allocating the cpus ect.
The issue I have is that when I run constant multi-thread requests to the endpoint for sometime, let's say a minute at some point I am getting a few (3-4) ~20 seconds response times which is slow. Normally, all the requests are proceeded in a matter of 1-4 seconds. I tried configuring the concurrency of the requests, the active instances and so on.
Does someone have any idea, why this may be happening?
I want to note that there is nothing much happening on the logs on those slow responses. Additionally, I do not have peaks on the metrics tab.