0

I'm planning to build a Kafka Cluster using two servers, and host Zookeeper on these two servers as well.

The Question is, since Kafka requires Zookeeper to run, what is the best cluster build for zookeeper to implement Kafka Cluster on two servers?

for eg. I'm currently running two zookeepers on both servers and one Kafka on each server, and in the Kafka configuration they point to all Zookeepers.

Is there a better way to do this?

frisky5
  • 29
  • 10
  • 1
    If you lose one ZK, you lose it and a broker? Are both machines on the same power strip? How important is this data, really? – OneCricketeer Feb 28 '20 at 01:00

2 Answers2

2

First of all, you don't have to setup Zookeper and Kafka in the same server. One of the roles of Zookeeper is electing controller. (one of the brokers which is responsible for maintaining the leader/follower relationship for all the partitions) For election; majority of Zookeper nodes must be alive. In your case even one Zookeeper instance is down, you cannot select controller. So there is no difference between having one Zookeper or two. That's why it is recommended to have at least 3 nodes in Zookeeper cluster. By this way you can handle failure of one Zookeeper node.

An addition to this, it is highly recommended to have at least three brokers in your Kafka cluster to maintain both consistency and high availability. (link1, link2)

UPDATE:

As long as you are limited to only two servers, then you can consider sacrificing from high availability by set up your broker by setting min.insync.replicas=2 and having topics with replication.factor=2. If HA is more important than data loss, then you can use min.insync.replicas=1 (default) broker config with again topic replication.factor=2. In this circumstance, your options are these IMHO. (Having one or two Zookeepers is not important as I mentioned above)

H.Ç.T
  • 3,335
  • 1
  • 18
  • 37
  • The problem is that I'm limited to Two Servers only, so whatever the zookeeper build is on these two servers it's be useless. I guess I'll be running both Kafka servers alone, and have a client to replicate whatever messages received to the other. – frisky5 Feb 28 '20 at 06:35
0

I am often faced with the same problem as you do @frisky5 where i would like to achieve a "suboptimal" HA system using only 2 nodes, and thus workarounds are always needed with cloud-native frameworks that rely on the assumption that clusters will have lot of nodes available.

That ain't always the case in real life, is it ;) ?

That being said, i see you essentially having 2 options:

  1. Externalize zookeeper configuration on a replicated storage system using 2 nodes (e.g. DRBD)
  2. Replicate Kafka data volumes entirely on the second nodes and use 2 one-node Kafka clusters that you switch on and off depending on who is the current master node.

I would go for the first option. In that case you would have 2 Kafka servers and one zookeeper server whose ip needs to be static (virtual ip). When the zookeeper node goes down, it is restarted one the second node with same VIP, but it needs to access the synchronized data folder.

I am not too familiar with zookeepers internals and i can't tell you whether it will go in conflict when starting up on a data store who "wasn't its own" but i would guess it makes sense for you to test it using a simple rsync setup.

Another way to achieve consensus if you are using a k3s based kubernetes cluster would be to rely on internal k8s distributed consensus mechanics to "tell Kafka" which node is the leader. This works for the postgresoperator by chruncydata because Patroni is cool ( https://patroni.readthedocs.io/en/latest/kubernetes.html ) but i am not sure if Kafka/zookeeper are that flexible and can communicate with a rest API to set their locks ... Once you have achieved this intermediate step, then you can use a PostgreSQL db as external source of truth for k3s and then it is as simple as syncing the postgres data folder between the machines (easily done with rsync). The beauty of this approach is that it is way more generic and could be used for other systems too.

Let me know what do you think about these two approaches and whether you manage to setup a test environment. If you do on GitHub i can help you out with implementation

Dber
  • 1
  • 1