Kafka leader election in multi-dc with an arbiter/witness/observer

Question

I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode.

For example, let say that both datacenters have 3 nodes with 2 in-sync replica (ISR) on the first DC and one ISR on the second DC.

Is it possible to have a third DC containing an arbiter/witness/observer node such that in case of failure of one DC, a leader election can succeed with the correct outcome in term of consistency? mongoDB has such feature named Replica set Arbiter.

What about deploying ZooKeeper on the three datacenters? From my understanding ZooKeeper does not hold the Kafka data and it should not be contacted for each new record in the Kafka topic, i.e. you do not pay the latency to the third DC for each new record.

there is no 'arbiter' role in zookeeper, but a 'follower', which don't participate in election, but sync data from leader. my suggestion is to deploy five nodes. two in the 1st, two in the 2nd, one in the 3rd. — sel-fish, Mar 09 '18 at 02:34
@sel-fish what are the data send to the zookeeper node? if the node in the 3rd DC does not participate to the election if you loose the 1st or 2nd DC the election cannot be successful (2 node voting for a quorum of 3) — Nicolas Henneaux, Mar 13 '18 at 06:51
sorry, check out my previous comment, i mean 'observer' not 'follower'. the existence of 'observer' doest not change the tolerance of node failure in a cluster. what i said, "deploy five nodes. two in the 1st, two in the 2nd, one in the 3rd", each of the five nodes participate to the election. — sel-fish, Mar 13 '18 at 07:17
Ok it was the proposition in the question but what is the impact of that? each zk node need to be contacted for each new record in Kafka? — Nicolas Henneaux, Mar 13 '18 at 07:23
When write to data to a zk cluster with the deployment i mentioned above, one write operation has to make sure persistent on at least 3 nodes, so the latency between 1st and 3rd IDC(or 2nd and 3rd IDC, i assume that 3rd IDC is more far away from the other two) doesn't matter. The latency is just determined by the latency between 1st and 2rd IDC. — sel-fish, Mar 13 '18 at 10:46
My question is about the communication between kafka and zk not inside zk. — Nicolas Henneaux, Mar 13 '18 at 11:02

Nicolas Henneaux · Accepted Answer · 2018-03-20T07:10:20.937

There is one presentation at the Kafka summit 2017 One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers speaking about this setup. There is also some interesting information inside a Confluent whitepaper Disaster Recovery for Multi-Datacenter Apache Kafka® Deployments. It says it could work and they called it an observer node but it also says no one has ever tried this.

Zookeeper keeps tracks of the following metadata for Kafka (0.9.0+).

Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
Cluster membership - which brokers are alive and part of the cluster? this is also managed through ZooKeeper.
Topic configuration - what overrides are there for that topic, where are the partitions located etc.
Quotas - how much data is each client allowed to read and write
ACLs - who is allowed to read and write to which topic

More detail on the dependency between Kafka and Zookeeper on the Kafka FAQ and answer at Quora from a Kafka commiter working at Confluent.

From the resources I have read, a setup with two DC (Kafka plus Zookeeper ) and an arbiter/witness/observer Zookeeper node on a third DC with high latency could work but I haven't found any resources that has actually experimented it.

This setup (DC1-DC2 with low latency and 2 Kafka/1 Zookeeper by DC plus a third _far_ DC with a single zookeeper node) is now running in production for more than 2 years without problem of resiliency. — Nicolas Henneaux, Oct 15 '20 at 06:57
Nice to hear about you success. Did you need to configure Zookeeper to avoid having the node in the third DC as leader to avoid geting loads of latency? — Samuel Åslund, Mar 17 '22 at 15:47
No, all three Zookeeper nodes are equivalent and can be leader. Zookeeper is only used when a new leader is needed for a Kafka topic so in normal usage it is not much queried. — Nicolas Henneaux, Mar 17 '22 at 16:43

Kafka leader election in multi-dc with an arbiter/witness/observer

1 Answers1