12

We are choosing the best option for implementing a leader election for our service (written in Java) comprised of multiple (e.g., 3) instances for high availability. Our goal is to have only a single instance active at any given time.

Would be great to hear your opinion about the following options:

1) Hazelcast. Using "quorum" and a lock we can implement a leader election. However, we can run into a split-brain problem where for some time two leaders may be present. Also, it seems that Hazelcast does not support SSL.

2) Zookeeper. We can implement leader election on top of a Zookeeper ensemble (where a ZK node is run on each instance of our service). Does Zookeeper provide better consistency guarantees than Hazelcast? Does it also suffer from the split-brain problem?

3) Etcd. We can use the Jetcd library which seems like the most modern and robust technology. Is it really better in terms of consistency than Zookeeper?

Thank you.

Michael
  • 357
  • 2
  • 12

1 Answers1

11

1) Hazelcast, by version 3.12, provides a CPSubsystem which is a CP system in terms of CAP and built using Raft consensus algorithm inside the Hazelcast cluster. CPSubsytem has a distributed lock implementation called FencedLock which can be used to implement a leader election.

For more information about CPSubsystem and FencedLock see;

Hazelcast versions before 3.12 are not suitable for leader election. As you already mentioned, it can choose availability during network splits, which can lead to election of multiple leaders.

2) Zookeeper doesn't suffer from the mentioned split-brain problem, you will not observe multiple leaders when network split happens. Zookeeper is built on ZAB atomic broadcast protocol.

3) Etcd is using Raft consensus protocol. Raft and ZAB have similar consistency guarantees, which both can be used to implement a leader election process.

Disclaimer: I work at Hazelcast.

mdogan
  • 1,929
  • 14
  • 15
  • Thank you for the answer. I have a follow up question. Does Zookeeper provide some time guarantees on the time i will take to detect a failed node? In the documentation it is mentioned that it can take up to tens of seconds to get up-to-date view of the system. Are there any configuration to tweak this time? – Michael Dec 04 '18 at 19:36
  • Created a separate question for the above comment: https://stackoverflow.com/questions/53620789/zookeeper-timeliness-property-consistency-guarantees – Michael Dec 04 '18 at 20:18
  • Also, a follow-up question regarding Hazelcast. We can subscribe for a quorum updates and if there is a quorum up/down event, elect a new leader using a lock. Sure, we will have two leaders for some short period of time due to the split-brain problem, but the same problem can also occur with Zookeeper. Am I missing something? – Michael Dec 04 '18 at 23:40
  • 1
    That problem does not happen with Zookeeper, because during split brain only one side will have the majority of the initial nodes. Other side will be minority and won't be allowed to write. This is the main point of having a consensus (or atomic broadcast). For example, assume you have 5 nodes initially. Majority of 5 is 3. Each write must be accepted by at least 3 nodes. When they split into two, one side will have 3 nodes, other side will have 2. So only the first side will be allowed to write. – mdogan Dec 05 '18 at 07:46
  • let's assume that one ZK node that has a "leader" client connected to it got disconnected from the rest of the ZK nodes. The network partition will be detected by this node after some time t1. And let's assume that the other ZK nodes detect the network partition at time t2. Since t1 and t2 depend on the network delays and ZK heartbeats threshold config, t1 may be larger than t2. So, before the "leader" client demotes itself, some client, connected to another ZK node, takes leadership. Am I missing something here? – Michael Dec 05 '18 at 08:19
  • In the Zookeeper documentation, they provide the following guarantees. Doesn't this mean that we may end up with 2 or 0 leaders during this inconsistent period? "The clients view of the system is guaranteed to be up-to-date within a certain time bound. (***On the order of tens of seconds.***) Either system changes will be seen by a client within this bound, or the client will detect a service outage." – Michael Dec 05 '18 at 08:26
  • @mdogan - Do you have any suggestion on this ? https://stackoverflow.com/questions/59212967/hazelcast-master-node-election-in-eks-aws-is-possible/59295536#59295536 – smilyface Jan 02 '20 at 07:17