11

this is my first post on Stackoverflow, i hope i didnt choose the wrong section.

Context :

Kafka HEAP size is configured on following file :

/etc/systemd/system/kafka.service

With following parameter :

Environment="KAFKA_HEAP_OPTS=-Xms6g -Xmx6g"

OS is "CentOS Linux release 7.7.1908".

Kafka is "confluent-kafka-2.12-5.3.1-1.noarch", installed from the following repository :

# Confluent REPO
[Confluent.dist]
name=Confluent repository (dist)
baseurl=http://packages.confluent.io/rpm/5.3/7
gpgcheck=1
gpgkey=http://packages.confluent.io/rpm/5.3/archive.key
enabled=1

[Confluent]
name=Confluent repository
baseurl=http://packages.confluent.io/rpm/5.3
gpgcheck=1
gpgkey=http://packages.confluent.io/rpm/5.3/archive.key
enabled=1

I activated SSL on a 3-machine KAFKA cluster few days ago, and suddently, the following command stopped working :

kafka-topics --bootstrap-server <the.fqdn.of.server>:9093 --describe --topic <TOPIC-NAME>

Which return me the following error :

[2019-10-03 11:38:52,790] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1':(org.apache.kafka.common.utils.KafkaThread) 
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
    at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
    at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
    at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
    at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
    at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
    at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:539)
    at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1152)
    at java.lang.Thread.run(Thread.java:748)

On the server's log, the following line appears when i try to request it via "kafka-topics" :

/var/log/kafka/server.log :
[2019-10-03 11:41:11,913] INFO [SocketServer brokerId=<ID>] Failed authentication with /<ip.of.the.server> (SSL handshake failed) (org.apache.kafka.common.network.Selector)

I was able to use this command properly BEFORE implementing SSL on the cluster. Here is the configuration i'm using. All functionnality work properly (consumers, producers...) except "kafka-topics" :

# SSL Configuration
ssl.truststore.location=<truststore-path>
ssl.truststore.password=<truststore-password>
ssl.keystore.type=<keystore-type>
ssl.keystore.location=<keystore-path>
ssl.keystore.password=<keystore-password>

# Enable SSL between brokers
security.inter.broker.protocol=SSL

# Listeners
listeners=SSL://<fqdn.of.the.server>:9093
advertised.listeners=SSL://<fqdn.of.the.server>:9093

There is no problem with the certificate (which is signed by internal CA, internal CA which i added to the truststore specified on the configuration). OpenSSL show no errors :

openssl s_client -connect <fqdn.of.the.server>:9093 -tls1
>> Verify return code: 0 (ok)

The following command is working pretty well with SSL, thanks to parameter "-consumer.config client-ssl.properties"

kafka-console-consumer --bootstrap-server <fqdn.of.the.server>:9093 --topic <TOPIC-NAME> -consumer.config client-ssl.properties

"client-ssl.properties" content is :

security.protocol=SSL
ssl.truststore.location=<truststore-path>
ssl.truststore.password=<truststore-password>

Right now, i'm forced to use "--zookeeper", which according to the documentation, is deprecated :

--zookeeper <String: hosts>              DEPRECATED, The connection string for  
                                       the zookeeper connection in the form 
                                       host:port. Multiple hosts can be     
                                       given to allow fail-over. 

And of course, it's working pretty well :

kafka-topics --zookeeper <fqdn.of.the.server>:2181 --describe --topic <TOPIC-NAME>
Topic:<TOPIC-NAME>  PartitionCount:3    ReplicationFactor:2 
Configs:
Topic: <TOPIC-NAME> Partition: 0    Leader: <ID-3>      Replicas: <ID-3>,<ID-1> Tsr: <ID-1>,<ID-3>
Topic: <TOPIC-NAME> Partition: 1    Leader: <ID-1>      Replicas: <ID-1>,<ID-2> Isr: <ID-2>,<ID-1>
Topic: <TOPIC-NAME> Partition: 2    Leader: <ID-2>      Replicas: <ID-2>,<ID-3> Isr: <ID-2>,<ID-3>

So, my question is : why am i unable to use "--bootstrap-server" atm ? Because of the "zookeeper" deprecation, i'm worried about not to be able to consult my topics, and their details...

I believe that kafka-topics needs the same option than kafka-console-consumer, aka "-consumer.config"...

Ask if any additionnal precision needed.

Thanks a lot, hope my question is clear and readable.

Blyyyn

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
Blyyyn
  • 191
  • 1
  • 2
  • 9
  • Kafka topics command can use bootstrap servers... But you need to increase the heap space for your commands, it seems – OneCricketeer Oct 03 '19 at 13:07
  • Hi cricket_007, thx for reply. Actually my HEAP is 6GB, see systemd environment settings : Environment="KAFKA_HEAP_OPTS=-Xms6g -Xmx6g" Just updated the post, thx for noticed i didnt tell a lot about my heap size configuration. – Blyyyn Oct 03 '19 at 14:25
  • Are you running `kafka-topics` command from the broker, then? If not, you still need to export it before running the command on a remote system. – OneCricketeer Oct 03 '19 at 15:18
  • I only tried to request between brokers ( ie kafka-A doing a bootstrap-server to kafka-B/C, or to himself). The SSL error mentionned ("SSL Handshake failed") on the main post is, i guess, the main issue i should focus on... But i don't really know how, since other commands, including the cluster himself, are working without any problem. Could it be an issue i should report to Kafka project ? – Blyyyn Oct 04 '19 at 07:48

4 Answers4

8

I finally found a way to deal with this SSL error. The key is to use the following setting :

--command-config client-ssl.properties

This is working with the most part of KAFKA commands, like kafka-consumer-groups, and of course kafka-topics. See examples below :

kafka-consumer-groups --bootstrap-server <kafka-hostname>:<kafka-port> --group <consumer-group> --topic <topic> --reset-offsets --to-offset <offset> --execute --command-config <ssl-config>

kafka-topics --list --bootstrap-server <kafka-hostname>:<kafka-port> --command-config client-ssl.properties

ssl-config was "client-ssl.properties",see initial post for content. Beware, by using IP address on , you'll have an error if the machine certificate doesnt have alternative name with that IP address. Try to have correct DNS resolution and use FQDN if possible.

Hope this solution will help, cheers!

Blyyyn

Blyyyn
  • 191
  • 1
  • 2
  • 9
2

Stop your Brokers and run below ( assuming you have more that 1.5GB RAM on your server)

export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"

then start your Brokers on all 3 nodes and then try it.

  • Hello Ashutosh, thanks fore reply. As i said to cricket_007, i set my HEAP params on systemd environment like follows : Environment="KAFKA_HEAP_OPTS=-Xms6g -Xmx6g" I really doubt this is a heap error... Plus this is a brand new cluster, ie not a lot of trafic for now. Just updated the post, thx for noticed i didnt tell a lot about my heap size configuration. – Blyyyn Oct 03 '19 at 14:27
  • The error comes from the clients, not the brokers, anyway, so they shouldn't be restarted – OneCricketeer Oct 03 '19 at 15:19
  • I notice that when exporting the KAFKA_HEAP_OPTS value on my bash terminal, the comportement is slightly different : now it spams the following (same) error multiple times (it appears just one time previously) : " [2019-10-04 09:56:34,794] INFO [SocketServer brokerId=11] Failed authentication with /6.101.80.11 (SSL handshake failed) (org.apache.kafka.common.network.Selector) " This won't stop until i CTRL+C the "kafka-topics" command. – Blyyyn Oct 04 '19 at 07:57
  • ERRATA : Command stopped after ~2mn, with the following error : ERROR java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. (no firewall at all, no iptables, no selinux errors...) – Blyyyn Oct 04 '19 at 08:24
  • Can you check 2 things 1) If selinux is disabled 2) 9093 port is open. – Ashutosh Singh Oct 04 '19 at 16:45
  • SELinux disabled doesnt change anything, and there is no FW between cluster member or local firewall machine like IPTABLES – Blyyyn Oct 08 '19 at 09:09
0

Note that for consumer and producer clients you need to prefix security.protocol accordingly inside your client-ssl.properties.


For Kafka Consumers:

consumer.security.protocol=SASL_SSL

For Kafka Producers:

producer.security.protocol=SASL_SSL
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
0

OOM java heap-space execption can be due to multiple reasons. In my case (AWS MSK kafka - IAM auth protected), it was due to permission access restriction. I had to mention client.properties file while running kafka-topics.sh script. --command-config /path/to/client.properties. see here youtube.com/watch?v=r12HYxWAJLo&t=477s

Anum Sheraz
  • 2,383
  • 1
  • 29
  • 54