gunicorn and/or celery: What is the way get the best out of both?

Question

I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).

The main problem is how to serve multiple requests to my app. Few months back celery has been added to get around this problem. The number of workers in celery that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.

When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.

I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn was a good application server python application.

Should I use gunicorn or any other application server along with celery and why
If I remove celery and only use gunicorn with the application can I achieve concurrency. I have read somewhere celery is not good for machine learning applications.
What are the purposes of gunicorn and celery. How can we achieve the best out of both.

Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.

Why is it not a good idea to expose an API in production with Flask? — Charles Landau, Dec 19 '18 at 06:16
"Flask’s _built-in server_ is not suitable for production" (emphasis mine) - not "Flask is not suitable for production". There is a distinction there. The entire rest of the page is dedicated to different setups which _are_ recommended. — Amadan, Dec 19 '18 at 06:19
Flagged as duplicate of: https://stackoverflow.com/questions/10938360/how-many-concurrent-requests-does-a-single-flask-process-receive — Charles Landau, Dec 19 '18 at 06:23
It's not a duplicate of that. It's might be similar. As it also takes into consideration celery. — Abhisek, Dec 19 '18 at 06:24
If your machine learning project is CPU intensive, concurrency is not your savior. You should consider optimizing your application first. Like cache your result, save your model, online learning, etc. — ssword, Jan 16 '20 at 15:59

score 11 · Accepted Answer · answered Dec 19 '18 at 06:23

There is no shame in flask. If in fact you just need a web API wrapper, flask is probably a much better choice than django (simply because django is huge and you'd be using only a fraction of its capability).

However, your concurrency problems are apparently stemming from the fact that you are doing some heavy-duty processing for each request. There is simply no way around that; if you require a certain amount of computational resources per request, you can't magic those up. From here on, it's a juggling act.

If you want a guaranteed response immediately, you need to have as many workers as potential simultaneous requests. This may involve load balancing over multiple servers, if you can't scrounge up enough resources on one server. (cue gunicorn, a web application server, responsible for accepting connections and then distributing them to multiple application processes.)
If you are okay with not getting an immediate response, you can let stuff queue up. (cue celery, a task queue, which worker processes can use to retrieve the next thing to be done, and deposit results). This works best if you don't need a response in the same request-response cycle; e.g. you submit a job from client, and they only get an acknowledgement that the job has been received; you would need a second request to ask about the status of the job, and possibly the results of the job if it is finished.
Alternately, instead of Flask you could use websockets or Tornado, to push out the response to the client when it is available (as opposed to user polling for results, or waiting on a live HTTP connection and taking up a server process).

gunicorn and/or celery: What is the way get the best out of both?

1 Answers1