3

I have 3 GKE clusters sitting in 3 different regions on Google Cloud Platform. I would like to create a Kafka cluster which has one Zookeeper and one Kafka node (broker) in every region (each GKE cluster).

This set-up is intended to survive regional failure (I know a whole GCP region going down is rare and highly unlikely).

I am trying this set-up using this Helm Chart provided by Incubator.

I tried this setup manually on 3 GCP VMs following this guide and I was able to do it without any issues.

However, setting up a Kafka cluster on Kubernetes seems complicated.

As we know we have to provide the IPs of all the zookeeper server in each zookeeper configuration file like below:

...
# list of servers
server.1=0.0.0.0:2888:3888
server.2=<Ip of second server>:2888:3888
server.3=<ip of third server>:2888:3888
...

As I can see in the Helm chart config-script.yaml file has a script which creates the Zookeeper configuration file for every deployment.

The part of the script which echos the zookeeper servers looks something like below:

...
for (( i=1; i<=$ZK_REPLICAS; i++ ))
do
   echo "server.$i=$NAME-$((i-1)).$DOMAIN:$ZK_SERVER_PORT:$ZK_ELECTION_PORT" >> $ZK_CONFIG_FILE
done
...

As of now the configuration that this Helm chart creates has the below Zookeeper server in the configuration with one replica (replica here means Kubernetes Pods replicas).

...
# "release-name" is the name of the Helm release
server.1=release-name-zookeeper-0.release-name-zookeeper-headless.default.svc.cluster.local:2888:3888
...

At this point, I am clueless and do not know what to do, so that all the Zookeeper servers get included in the configuration file?

How shall I modify the script?

Amit Yadav
  • 4,422
  • 5
  • 34
  • 79
  • Updated tags to reflect that you haven't started with Kafka yet – OneCricketeer Oct 30 '19 at 20:27
  • @cricket_007 My end-goal is to set-up Kafka cluster. So I have added `kafka-cluster` tag and put `Kafka/Zookeeper` in the question. Looks okay? – Amit Yadav Oct 30 '19 at 20:33
  • I understand the end goal. [tag:kafka-cluster] doesn't get the same following of [tag:apache-kafka] – OneCricketeer Oct 30 '19 at 21:17
  • If you are not planning to deploy it and manage it yourself, then there are a number of "Kafka as a service"(not giving the vendor names here) managed services on GCP. GCP also has a Pub/Sub topic managed service based on Kafka messaging system – Prashant Oct 31 '19 at 05:19
  • Also there is this service from MarketPlace but on Compute https://console.cloud.google.com/marketplace/details/click-to-deploy-images/kafka?walkthrough_tutorial_id=toc – Prashant Oct 31 '19 at 05:21
  • @Prashant thank you for the suggestion and resources but I would like to set-up my own Kafka-cluster running on 3 GKE clusters (all being different regions). – Amit Yadav Oct 31 '19 at 07:21

1 Answers1

2

I see you are trying to create 3 node zookeeper cluster on top of 3 different GKE clusters.

This is not an easy task and I am sure there are multiple ways to achieve it but I will show you one way in which it can be done and I believe it should solve your problem.

The first thing you need to do is create a LoadBalancer service for every zookeeper instance. After LoadBalancers are created, note down the ip addresses that got assigned (remember that by default these ip addresses are ephemeral so you might want to change them later to static).

Next thing to do is to create an private DNS zone on GCP and create A records for every zookeeper LoadBalancer endpoint e.g.:

release-name-zookeeper-1.zookeeper.internal.
release-name-zookeeper-2.zookeeper.internal.
release-name-zookeeper-3.zookeeper.internal.

and in GCP it would look like this:

dns

After it's done, just modify this line:

...
DOMAIN=`hostname -d'
...

to something like this:

...
DOMAIN={{ .Values.domain }}
...

and remember to set domain variable in Values file to zookeeper.internal

so in the end it should look like this:

DOMAIN=zookeeper.internal

and it should generate the folowing config:

...
server.1=release-name-zookeeper-1.zookeeper.internal:2888:3888
server.2=release-name-zookeeper-2.zookeeper.internal:2888:3888
server.3=release-name-zookeeper-3.zookeeper.internal:2888:3888
...

Let me know if it is helpful

Matt
  • 7,419
  • 1
  • 11
  • 22
  • thanks, I think this is the second step. The first step would be to create the zookeeper as external service, how do we do that? – Amit Yadav Nov 06 '19 at 04:47
  • Use a regular k8s service with `type: LoadBalancer`. Look [here](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/) for examples. Another important thing you should consider is using `loadBalancerSourceRanges` (for security reasons), take a look [here](https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/) for more examples. – Matt Nov 06 '19 at 07:41
  • I understand that but what configuration changes I have to make in the Helm chart so that the Zookeeper instances get exposed as Load Balancer services? – Amit Yadav Nov 06 '19 at 14:04
  • But that is applicable to the Kafka Helm chart, not the Zookeeper one which is [this one](https://github.com/helm/charts/tree/master/incubator/zookeeper). So, this Helm chart is written keeping only one cluster in mind in which Zookeepers connect using Headless services and everything works well. But in the case of multiple clusters like in my case, it fails. Because headless services are not reachable by name to outside the cluster – Amit Yadav Nov 06 '19 at 14:24
  • [here](https://github.com/helm/charts/blob/ca895761948d577df1cb37243b6afaf7b077bac3/incubator/zookeeper/templates/service.yaml) add `type: LoadBalancer`. Also look [here](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) for examples – Matt Nov 06 '19 at 14:25
  • @AmitYadav Yes, sorry, I deleted this comment and corrected myself – Matt Nov 06 '19 at 14:25