So, we started a bunch of Kafka Streams applications without realizing the default replication factor is 1.
We've made the code modifications (e.g. What should be the replication factor of changelog/repartition topics )
However, I don't think that'll help with applications that have already been deployed or alter internal topics that have already been created.
For example, I used kafkacat
to list out a handful of topics (based on the application.id
prefix, and all have one replica)
Obviously, when a broker starts having issues (broker.id
11 or 21 here), the applications are not working well.
topic "appid-KTABLE-SUPPRESS-STATE-STORE-0000000013-changelog" with 1 partitions:
partition 0, leader 11, replicas: 11, isrs: 11
--
topic "appid-KSTREAM-AGGREGATE-STATE-STORE-0000000019-changelog" with 1 partitions:
partition 0, leader 21, replicas: 21, isrs: 21
--
topic "appid-KSTREAM-AGGREGATE-STATE-STORE-0000000009-changelog" with 1 partitions:
partition 0, leader 11, replicas: 11, isrs: 11
--
topic "appid-KSTREAM-AGGREGATE-STATE-STORE-0000000007-changelog" with 1 partitions:
partition 0, leader 21, replicas: 21, isrs: 21
I understand how to increase the replication factor (e.g. How to change the number of replicas of a Kafka topic?), but my questions
Do these numbers have a specific meaning other than the processor ordering of Kafka Streams?
How many of these topics should I really be increasing the replication factor for (assuming I am doing it manually, and having to do it for multiple clusters)?
Also: resetting the streams application to cleanup the internal topics doesn't seem like a good option due to how the applications write to downstream systems.