3

For deploying a Django web app to GCR, I would like to understand the relationships between various autoscaling related parameters of Gunicorn and GCR.

Gunicorn has flags like:

  • workers
  • threads
  • timeout

Google Cloud Run has these configuration options:

  • CPU limit
  • Min instances
  • Max instances
  • Concurrency

My understanding so far:

  • Number of workers set in Gunicorn should match the CPU limit of GCR.
  • We set timeout to 0 in Gunicorn to allow GCP autoscale the GCR instance.
  • GCP will always keep some instances alive, this number is Min instances.
  • When more traffic comes, GCP will autoscale up to a certain number, this number is Max instances.

I want to know the role of threads (Gunicorn) and concurrency (GCR) in autoscaling. More specifically:

  • How does the number of thread in Gunicorn affect autoscaling?

I think This should not affect autoscaling at all. They are useful for background tasks such as file operations, making async calls etc.

  • How does the Concurrency setting of GCR affect autoscaling?

If number or workers is set to 1, then a particular instance should be able to handle only one request at a time. So setting this value to anything more than 1 does not help. In fact, We should set CPU limit, concurrency, workers these three to match each other. Please let me know if this is correct.

Edit 1: Adding some details in response to John Hanley's commment.

  • We expect to have up to 100 req/s. This is based on what we've seen in GCP console. If our business grows we'll get more traffic. So I would like to understand how the final decision changes if we're to expect say 200 or 500 req/s.
  • We expect requests to arrive in bursts. Users are groups of people who perform some activities on our web app during a given time window. There can be only one such event on a given day, but the event will see 1000 or more users using our services for a 30 minute window. On busy days, we can have multiple events, some of them may overlap. The service will be idle outside of the event times.
  • How many simultaneous requests can a cloud run instance handle? I am trying to understand this one myself. Without cloud run, I could've deployed this with x number of workers and then the answer would've been x. But with cloud run, I don't know if the number of gunicorn workers have the same meaning.

Edit 2: more details.

  • The application is stateless.
  • The web app reads and writes to DB.
Raiyan
  • 1,589
  • 1
  • 14
  • 28
  • 1
    This question is difficult to answer with facts and/or citations. A good answer depends on how your Cloud Run instance is used: a) how many simultaneous requests; b) the interval between requests; c) How many simultaneous requests your Cloud Run instance can handle. A Cloud Run instance that is not receiving requests is eligible to be shut down. Item #a will affect configuring Gunicorn. Item #b and #c will affect configuring Cloud Run. Note: you must enable "Always On CPU" to prevent container shutdowns. Add more details to your question (not comments). – John Hanley Sep 13 '22 at 21:16
  • Have a look at this stackoverflow [link](https://stackoverflow.com/a/41696500/18265638) – Sathi Aiswarya Sep 14 '22 at 08:58
  • @JohnHanley I have answered your questions. Please see Edit 1 in the question. Thanks. – Raiyan Sep 14 '22 at 20:44
  • How long does each request take to process on Cloud Run? Does your Django app store state (database)? How are you coordinating the state between multiple Cloud Run instances? Your application is larger than one Cloud Run instance which means you need to synchronize state between instances unless your application is stateless or stores all state at the client via cookies, etc. – John Hanley Sep 14 '22 at 21:02
  • please refer this stackoverflow link(https://stackoverflow.com/a/71381592/18265638) – Sathi Aiswarya Sep 20 '22 at 12:27

0 Answers0