Gunicorn (with Flask) parameters for Google Cloud Run (GCR) - what to put in Dockerfile?

Question

Looking for some guidance from people with practical GCR experience. How do you get on with this? I run a Docker container (approx. 670mb in size) in Google Cloud Run, inside is my Python server based on Flask and it is currently ran by this command in the Dockerfile:

CMD exec gunicorn --bind 0.0.0.0:8080 --reload --workers=1 --threads 8 --timeout 0 "db_app.app:create_app()"

Say I will need to serve about 300 requests per hour.

How many workers, threads, should I specify in my exec command to use the GCR's capabilities most effectively?

For example basic configuration of GCR server is something like 1 CPU 1gb of RAM.

So how should I set my Gunicorn there? Maybe I should also use --preload? specify worker-connections?

As Dustin cited in his answer (see below), official Google docs suggest to write this in the Dockerfile:

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

I've no idea about how many cores they have on that "1 CPU" in the GCR configuration, so I doubt this example code is very accurate, it's more likely to be there to just demonstrate how it works in general. So I would be (and everyone in my situation would) very grateful if someone who has a working Gunicorn server packed into a container in Google Cloud Run could share some info about how to properly configure it - basically what to put into this Dockerfile CMD line instead of the generic example code? Something more real-life-proof.

I think this is a software problem, cuz we're talking about writing things in Dockerfile (question was closed and marked as "not SO scope question").

This question was likely closed since there is now a question regarding vCPUs, which has good explanations [here](https://stackoverflow.com/a/45967989/12762626) and on the GCP [public documentation](https://cloud.google.com/compute/docs/cpu-platforms). Thus, through answering your question regarding the CPUs assigned to [Cloud Run container instances](https://cloud.google.com/run/docs/configuring/cpu#console), it seems that the answer provided by the Googler would likely help here. — KevinH, Nov 18 '20 at 21:34

score 6 · Answer 1 · answered Nov 18 '20 at 18:07

6

The guidance from Google is the following configuration:

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

Using --preload may reduce cold start times, but it also may lead to unexpected behavior, which is largely dependent on how your application is structured.

You should not use --reload in production.

You should also bind to $PORT and not hard-code 8080 as the port.

answered Nov 18 '20 at 18:07

Dustin Ingram

20,502
7
59
82

Thank you for the answer, I saw this example, but it's too generic and I doubt it's really optimal in terms of how the resources are used. They do recommend one worker in it, but simultaneously say to increase the number according to the number of cores and how many cores do they have there? That's why I hoped for some people who have such/similar servers on GCR to share some knowledge on their setups.. – Amy Wong Nov 18 '20 at 18:55
3

By default, Cloud Run instances are allocated 1 vCPU (https://cloud.google.com/run/docs/reference/container-contract#cpu). The number of threads is largely dependent on your workload, see https://docs.gunicorn.org/en/stable/design.html#how-many-threads. The best answer is "start here and tune as necessary". – Dustin Ingram Nov 18 '20 at 21:35
1

I see.. so then above that part on threads, Gunicorn docs also advise to put the number of workers to `(2 x $num_cores) + 1`. Would it be correct to assume that the minimum number of workers is always 3? Especially in case of Cloud Run's one CPU. – Amy Wong Nov 19 '20 at 01:50
4

Because Cloud Run is serverless, it's better/faster/more effieicne for multiple instances to serve concurrent requests than to have more workers per instance, as this reduces the overall memory footprint and overhead of each instance. – Dustin Ingram Nov 19 '20 at 15:53
@DustinIngram This makes sense, but do you agree the guidance of "For environments with multiple CPU cores, increase the number of workers" or should it just be kept at `1` to use more instances instead? – howMuchCheeseIsTooMuchCheese Apr 08 '22 at 14:36

Gunicorn (with Flask) parameters for Google Cloud Run (GCR) - what to put in Dockerfile?

1 Answers1