35

We are trying to consume from a Kafka Cluster using the Java Client. The Cluster is a behind a Jump host and hence the only way to access is through a SSH Tunnel. But we are not able read because once the consumer fetches metadata it uses the original hosts to connect to brokers. Can this behaviour be overridden? Can we ask Kafka Client to not use the metadata?

Sourabh
  • 595
  • 2
  • 6
  • 13
  • 1
    You can't turn off metadata request or the client will have no way to know which broker has the current leader for each partition. This should work if the brokers are configured properly to advertise the ip or domain name of the tunnel as well as the normal internal listen ports – Hans Jespersen Aug 24 '17 at 15:00
  • My solution is edit the hosts file https://unix.stackexchange.com/a/703898/527277 – Jiarui Tian May 26 '22 at 06:44
  • The official way is to change the `advertised.listeners` setting. Changing the host file is a hack. Have a look at my answer: https://stackoverflow.com/a/75530317/828366 – Francesco Casula Feb 22 '23 at 09:05

5 Answers5

26

Not as far as I know.

The trick I used when I needed to do something similar was:

  1. setup a virtual interface for each Kafka broker
  2. open a tunnel to each broker so that broker n is bound to virtual interface n
  3. configure your /etc/hosts file so that the advertised hostname of broker n is resolved to the ip of the virtual interface n.

Es.

Kafka brokers:

  • broker1 (advertised as broker1.mykafkacluster)
  • broker2 (advertised as broker2.mykafkacluster)

Virtual interfaces:

  • veth1 (192.168.1.1)
  • veth2 (192.168.1.2)

Tunnels:

  • broker1: ssh -L 192.168.1.1:9092:broker1.mykafkacluster:9092 jumphost
  • broker2: ssh -L 192.168.1.2:9092:broker1.mykafkacluster:9092 jumphost

/etc/hosts:

  • 192.168.1.1 broker1.mykafkacluster
  • 192.168.1.2 broker2.mykafkacluster

If you configure your system like this you should be able reach all the brokers in your Kafka cluster.

Note: if you configured your Kafka brokers to advertise an ip address instead of a hostname the procedure can still work but you need to configure the virtual interfaces with the same ip address that the broker advertises.

nivox
  • 2,060
  • 17
  • 18
  • I don't have liberties to do all this as the kafka clusters that I need to connect are many and the clients that I want to connect them from are also dynamic. So, cannot really change /etc/hosts manually. Also, I don't have access to them to change advertised ips. – Sourabh Aug 26 '17 at 04:27
  • when I create a vif (sudo ip addr add 10.x.x.x/32 dev eth0) (IP of my broker) then I can't access the broker anymore (I got could not establish connection) till I delete the VIF again. (I've brokers with only an IP so I'm creating a VIF with same IP as the broker as you told) – DenCowboy May 19 '20 at 13:51
  • works like a charme, but you have to create virtual interface/tunnels for all brokers in order to work properly. also to create virtual interface in mac: sudo ifconfig lo0 alias 192.168.1.1 this will create virtual interface on loopback. repeat this for all brokers – Ahmed Abbas Nov 17 '21 at 11:15
12

You don't actually have to add virtual interfaces to acces the brokers via SSH tunnel if they advertise a hostname. It's enough to add a hosts entry in /etc/hosts of your client and bind the tunnel to the added name.

Assuming broker.kafkacluster is the advertised.hostname of your broker:

/etc/hosts:
127.0.2.1 broker.kafkacluster

Tunnel:
ssh -L broker.kafkacluster:9092:broker.kafkacluster:9092 <brokerhostip/name>

fwendlandt
  • 129
  • 1
  • 2
7

Try sshuttle like this:

sshuttle -r user@host broker-1-ip:port broker-2-ip:port broker-3-ip:port

Of course, the list of broker depends on advertised listeners broker setting.

injecto
  • 829
  • 1
  • 10
  • 23
4

Absolutely best solution for me was to use kafkatunnel (https://github.com/simple-machines/kafka-tunnel). Worked like a charm.

lindelof
  • 34,556
  • 31
  • 99
  • 140
0

Changing the /etc/hosts file is NOT the right way.

Quoting Confluent blog post:

I saw a Stack Overflow answer suggesting to just update my hosts file…isn’t that easier?

This is nothing more than a hack to work around a misconfiguration instead of actually fixing it.

You need to set advertised.listeners (or KAFKA_ADVERTISED_LISTENERS if you’re using Docker images) to the external address (host/IP) so that clients can correctly connect to it. Otherwise, they’ll try to connect to the internal host address—and if that’s not reachable, then problems ensue.

Confluent blog post

Additionally you can have a look at this Pull Request on GitHub where I wrote an integration test to connect to Kafka via SSH. It should be easy to understand even if you don't know Golang.

There you have a full client and server example (see TestSSH). The test is bringing up actual Docker containers and it runs assertions against them.

TL;DR I had to configure the KAFKA_ADVERTISED_LISTENERS when connecting over SSH so that the host advertised by each broker would be one reachable from the SSH host. This is because the client connects to the SSH host first and then from there it connects to a Kafka broker. So the host in the advertised.listeners must be reachable from the SSH server.

Francesco Casula
  • 26,184
  • 15
  • 132
  • 131