I have a web application written in ASP. NET 6.x. I'm currently just finishing up migrating the project from AWS to Google Cloud, but I'm noticing that latency is overall higher and some more complex operations are taking drastically longer to process on my server compared to running on AWS t2-small
instance.
My application api server runs in docker, deployed to Google Cloud Run, 4 vCPU and 2GB of memory (gen2 instances).
It connects to a CloudSQL mysql 8.0
database which is running on db-g1-small
1.7 GB memory, 1 vCPU.
I also have a load balancer in front of the Cloud Run application to support my custom HTTPS domain.
However almost all requests to my server take 500ms TTFB in chrome (time to first byte response) even for trivial 1ms database queries.
For reference, copying the entire production database to my local machine and running the serer locally, the exact same request TTFB is around 50ms.
I understand being hosted non locally would add some latency, to/from the load balancer, but I don't see how it would add ~500ms.
Here is the things I have tried to rule out some causes:
- I have tried bypassing the load balancer and connecting to Cloud run instance directly. Had no effect.
- I have bumped up Cloud run to max CPU settings and drastically more RAM than my app requires, but it had no effect. I also tried making the CPU "always on".
- I have tried bumping up the instance type on Cloud SQL to have more vCPU and RAM, but it also had no effect.
- I tried running the application locally, but connecting to the Cloud SQL database. Requests processed in around 600ms.
- Calling an endpoint that does not query Cloud SQL also has at least 300ms of latency. For reference, this takes about 19ms when running locally.
- According to speed tests, my Internet ping is 23ms.
- The Cloud Run metrics don't show significant CPU or Memory usage, so it seems like a network problem
Based on the above information, it seems like most of the latency "300ms" is just getting to / processing on Cloud run itself. 200ms more if a query is involved. Given that more complex requests seem to have disproportionately more latency, it seems like the cause is just weak single thread CPU power on Google cloud run.
Is it normal that a trivial HTTP request takes a minimum of 300ms on Cloud Run? How can I troubleshoot this and/or resolve it?