In order to enable high availability in Kafka you need to take into account the following factors:
1. Replication factor: By default, replication factor is set to 1
. The recommended replication-factor
for production environments is 3
which means that 3 brokers are required.
2. Preferred Leader Election: When a broker is taken down, one of the replicas becomes the new leader for a partition. Once the broker that has failed is up and running again, it has no leader partitions and Kafka restores the information it missed while it was down, and it becomes the partition leader again. Preferred leader election is enabled by default. In order to minimize the risk of losing messages when switching back to the preferred leader you need to set the producer property acks
to all
(obviously this comes at a performance cost).
3. Unclean Leader Election:
You can enable unclean leader election in order to allow an out-of-sync replica to become the leader and maintain high availability of a partition. With unclean leader election, messages that were not synced to the new leader are lost. There is a trade-off between consistency and high availability meaning that with unclean leader election disabled, if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online.
4. Acknowledgements:
Acknowledgements refer to the number of replicas that commit a new message before the message is acknowledged using acks
property. When acks is set to 0
the message is immediately acknowledged without waiting for other brokers to commit. When set to 1
, the message is acknowledged once the leader commits the message. Configuring acks
to all
provides the highest consistency guarantee but slower writes to the cluster.
5. Minimum in-sync replicas: min.insync.replicas
defines the minimum number o in-sync replicas that must be available for the producer in order to successfully send the messages to the partition.If min.insync.replicas
is set to 2
and acks
is set to all
, each message must be written successfully to at least two replicas. This means that the messages won't be lost, unless both brokers fail (unlikely). If one of the brokers fails, the partition will no longer be available for writes. Again, this is a trade-off between consistency and availability.