How to troubleshoot high latency when running on GCP Cloud Run + Cloud SQL

Question

I have a web application written in ASP. NET 6.x. I'm currently just finishing up migrating the project from AWS to Google Cloud, but I'm noticing that latency is overall higher and some more complex operations are taking drastically longer to process on my server compared to running on AWS t2-small instance.

My application api server runs in docker, deployed to Google Cloud Run, 4 vCPU and 2GB of memory (gen2 instances).

It connects to a CloudSQL mysql 8.0 database which is running on db-g1-small 1.7 GB memory, 1 vCPU.

I also have a load balancer in front of the Cloud Run application to support my custom HTTPS domain.

However almost all requests to my server take 500ms TTFB in chrome (time to first byte response) even for trivial 1ms database queries.

For reference, copying the entire production database to my local machine and running the serer locally, the exact same request TTFB is around 50ms.

I understand being hosted non locally would add some latency, to/from the load balancer, but I don't see how it would add ~500ms.

Here is the things I have tried to rule out some causes:

I have tried bypassing the load balancer and connecting to Cloud run instance directly. Had no effect.
I have bumped up Cloud run to max CPU settings and drastically more RAM than my app requires, but it had no effect. I also tried making the CPU "always on".
I have tried bumping up the instance type on Cloud SQL to have more vCPU and RAM, but it also had no effect.
I tried running the application locally, but connecting to the Cloud SQL database. Requests processed in around 600ms.
Calling an endpoint that does not query Cloud SQL also has at least 300ms of latency. For reference, this takes about 19ms when running locally.
According to speed tests, my Internet ping is 23ms.
The Cloud Run metrics don't show significant CPU or Memory usage, so it seems like a network problem

Based on the above information, it seems like most of the latency "300ms" is just getting to / processing on Cloud run itself. 200ms more if a query is involved. Given that more complex requests seem to have disproportionately more latency, it seems like the cause is just weak single thread CPU power on Google cloud run.

Is it normal that a trivial HTTP request takes a minimum of 300ms on Cloud Run? How can I troubleshoot this and/or resolve it?

I would say it's something wrong in your container or in your app. Replace the container with something static, like serving a HTML page, you will get quick responses. I would investigate region as well, but 99% is something wrong in your container. Experiment with a really basic container out of the .NET ecosystem, as I have a hunch that something in your .NET ecosystem boots up slowly in your container. — Pentium10, Nov 13 '21 at 21:15
Share a hello world example of your container you are using, use a gist, or a github link for quick sharing. We can try out, to verify if the container is "slow inside". — Pentium10, Nov 13 '21 at 21:22
I tried creating a new Cloud run service with the default "hello" container and the latency is around 100ms which is about the same as my ping to Montreal. I'm going to try changing my base image to alpine to see if that helps at all. Seems like the culprit is my code, but I still don't understand how it runs so much slower on GCP than my local PC. — Brad, Nov 13 '21 at 21:54
When you test locally, test with the container, not directly the code to be able to compare comparable things. In addition, keep in mind that your CPU can go up to 4Ghz (according to the age and the power of your machine). In the cloud it's more about 2 or 2.4Ghz, factor 1.5 or 2 is acceptable. — guillaume blaquiere, Nov 14 '21 at 11:44
@Brad I am able to see 50ms constantly, so there is room for container improvement. https://i.imgur.com/MoShogV.png I skeptic that .NET environment can achieve this, but on linux and whatever library even PHP like my image is possible. — Pentium10, Nov 14 '21 at 13:18
I decreased logging frequency (removed DEBUG and INFO messages from being logged) which seemed to have shaved 50-100ms on most responses. In conjunction with the comment by @guillaumeblaquiere, I'm skeptical if it can go down much further just due to the single core performance of cloud servers — Brad, Nov 14 '21 at 18:59
@Brad cloud run uses multi cpu, also you should check out these recommendations as well https://stackoverflow.com/questions/54103309/how-to-configure-asp-net-kestrel-for-low-latency — Pentium10, Nov 14 '21 at 19:20
@Brad I tried out this .net hello world example with the closes GCP region I have on Cloud Run (check yours https://www.gcping.com ) , and I am able to get 40ms response times https://github.com/knative/docs/tree/ca0a1b32da4df27a043ebdbd0bd032d1723bbadb/docs/serving/samples/hello-world/helloworld-csharp — Pentium10, Nov 14 '21 at 19:37

How to troubleshoot high latency when running on GCP Cloud Run + Cloud SQL

0 Answers0