I am going to deploy a Python Flask Server with Docker on Kubernetes using Gunicorn and Gevent/Eventlet as asynchronous workers. The application will:
- Subscribe to around 20 different topics on Apache Kafka.
- Score some machine learning models with that data.
- Upload the results to a relational database.
Each topic in Kafka will receive 1 message per minute, so the application needs to consume around 20 messages per minute from Kafka. For each message, the handling and execution take around 45 seconds. The question is how I can scale this in a good way? I know that I can add multiple workers in Gunicorn and use multiple replicas of the pod when I deploy to Kubernetes. But is that enough? Will the workload be automatically balanced between the available workers in the different pods? Or what can I do to ensure scalability?