I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).
The main problem is how to serve multiple requests to my app. Few months back celery
has been added to get around this problem. The number of workers in celery
that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.
When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.
I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn
was a good application server python application.
- Should I use
gunicorn
or any other application server along withcelery
and why - If I remove
celery
and only usegunicorn
with the application can I achieve concurrency. I have read somewherecelery
is not good for machine learning applications. - What are the purposes of
gunicorn
andcelery
. How can we achieve the best out of both.
Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.