For deploying a Django web app to GCR, I would like to understand the relationships between various autoscaling related parameters of Gunicorn and GCR.
Gunicorn has flags like:
- workers
- threads
- timeout
Google Cloud Run has these configuration options:
- CPU limit
- Min instances
- Max instances
- Concurrency
My understanding so far:
- Number of workers set in Gunicorn should match the CPU limit of GCR.
- We set
timeout
to 0 in Gunicorn to allow GCP autoscale the GCR instance. - GCP will always keep some instances alive, this number is
Min instances
. - When more traffic comes, GCP will autoscale up to a certain number, this number is
Max instances
.
I want to know the role of threads (Gunicorn) and concurrency (GCR) in autoscaling. More specifically:
- How does the number of thread in Gunicorn affect autoscaling?
I think This should not affect autoscaling at all. They are useful for background tasks such as file operations, making async calls etc.
- How does the Concurrency setting of GCR affect autoscaling?
If number or workers is set to 1, then a particular instance should be able to handle only one request at a time. So setting this value to anything more than 1 does not help. In fact, We should set CPU limit, concurrency, workers these three to match each other. Please let me know if this is correct.
Edit 1: Adding some details in response to John Hanley's commment.
- We expect to have up to 100 req/s. This is based on what we've seen in GCP console. If our business grows we'll get more traffic. So I would like to understand how the final decision changes if we're to expect say 200 or 500 req/s.
- We expect requests to arrive in bursts. Users are groups of people who perform some activities on our web app during a given time window. There can be only one such event on a given day, but the event will see 1000 or more users using our services for a 30 minute window. On busy days, we can have multiple events, some of them may overlap. The service will be idle outside of the event times.
- How many simultaneous requests can a cloud run instance handle? I am trying to understand this one myself. Without cloud run, I could've deployed this with x number of workers and then the answer would've been x. But with cloud run, I don't know if the number of gunicorn workers have the same meaning.
Edit 2: more details.
- The application is stateless.
- The web app reads and writes to DB.