I have a Kafka Cluster (Using Aivan on AWS):
Kafka Hardware
Startup-2 (2 CPU, 2 GB RAM, 90 GB storage, no backups) 3-node high availability set
- Ping between my consumers and the Kafka Broker is 0.7ms.
Backgroup
I have a topic such that:
- It contains data about 3000 entities.
- Entity lifetime is a week.
- Each week there will be different 3000 entities (on avg).
- Each entity may have between 15k to 50k messages in total.
- There can be at most 500 messages per second.
Architecture
My team built an architecture such that there will be a group of consumers. They will parse this data, perform some transformations (without any filtering!!) and then sends the final messages back to the kafka to topic=<entity-id>
.
It means I upload the data back to the kafka to a topic that contains only a data of a specific entity.
Questions
At any given time, there can be up to 3-4k topics in kafka (1 topic for each unique entity).
- Can my kafka handle it well? If not, what do I need to change?
- Do I need to delete a topic or it's fine to have (alot of!!) unused topics over time?
- Each consumer which consumes the final messages, will consume 100 topics at the same time. I know kafka clients can consume multiple topics concurrenctly but I'm not sure what is the best practices for that.
- Please share your concerns.
Requirements
- Please focus on the potential problems of this architecture and try not to talk about alternative architectures (less topics, more consumers, etc).