1

In my dev environment I have deployed a Kafka cluster in KRaft mode with three nodes that act as controller and broker. When the machines are deployed everything seems fine, but after 10 minutes a log message is displayed on node 1 that node 2 and 3 are disconnected.

Logs:

  • Node 1: node 2 and 3 disconnected
  • Node 2: nothing about disconnected nodes
  • Node 3: node 2 disconnected

Logs on node 1 (the couple of WARN messages are there because that node is still being rolled out at that moment).

[2023-04-14 15:18:41,618] INFO [SocketServer listenerType=BROKER, nodeId=1] Enabling request processing. (kafka.network.SocketServer)
[2023-04-14 15:18:41,621] INFO [StandardAuthorizer 1] Completed initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData)
[2023-04-14 15:18:41,626] INFO [Controller 1] The request from broker 1 to unfence has been granted because it has caught up with the offset of it's register broker record 1. (org.apache.kafka.controller.BrokerHeartbeatManager)
[2023-04-14 15:18:41,646] INFO [Controller 1] Unfenced broker: 1 (org.apache.kafka.controller.ClusterControlManager)
[2023-04-14 15:18:41,647] INFO [RaftManager nodeId=1] Node 3 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:18:41,648] WARN [RaftManager nodeId=1] Connection to node 3 (/172.31.38.35:9093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:18:41,700] INFO [BrokerLifecycleManager id=1] The broker has been unfenced. Transitioning from RECOVERY to RUNNING. (kafka.server.BrokerLifecycleManager)
[2023-04-14 15:18:41,701] INFO [BrokerServer id=1] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2023-04-14 15:18:41,703] INFO Kafka version: 3.3.1 (org.apache.kafka.common.utils.AppInfoParser)
[2023-04-14 15:18:41,703] INFO Kafka commitId: e23c59d00e687ff5 (org.apache.kafka.common.utils.AppInfoParser)
[2023-04-14 15:18:41,703] INFO Kafka startTimeMs: 1681478321701 (org.apache.kafka.common.utils.AppInfoParser)
[2023-04-14 15:18:41,704] INFO [KafkaRaftServer nodeId=1] Kafka Server started (kafka.server.KafkaRaftServer)
[2023-04-14 15:18:41,778] INFO [Controller 1] The request from broker 2 to unfence has been granted because it has caught up with the offset of it's register broker record 3. (org.apache.kafka.controller.BrokerHeartbeatManager)
[2023-04-14 15:18:41,778] INFO [Controller 1] Unfenced broker: 2 (org.apache.kafka.controller.ClusterControlManager)
[2023-04-14 15:18:42,103] INFO [RaftManager nodeId=1] Node 3 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:18:42,103] WARN [RaftManager nodeId=1] Connection to node 3 (/172.31.38.35:9093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:18:42,658] INFO [RaftManager nodeId=1] Node 3 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:19:18,348] INFO [Controller 1] Added new fenced broker: 3 (org.apache.kafka.controller.ClusterControlManager)
[2023-04-14 15:19:18,349] INFO [Controller 1] Registered new broker: RegisterBrokerRecord(brokerId=3, incarnationId=X_1FgQvvQkSE8ybWbaKWWg, brokerEpoch=80, endPoints=[BrokerEndpoint(name='SSL', host='172.31.38.35', port=9094, securityProtocol=1)], features=[BrokerFeature(name='metadata.version', minSupportedVersion=1, maxSupportedVersion=7)], rack=null, fenced=true, inControlledShutdown=false) (org.apache.kafka.controller.ClusterControlManager)
[2023-04-14 15:19:18,818] INFO [Controller 1] The request from broker 3 to unfence has been granted because it has caught up with the offset of it's register broker record 80. (org.apache.kafka.controller.BrokerHeartbeatManager)
[2023-04-14 15:19:18,819] INFO [Controller 1] Unfenced broker: 3 (org.apache.kafka.controller.ClusterControlManager)
[2023-04-14 15:28:41,407] INFO [RaftManager nodeId=1] Node 2 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-04-14 15:29:17,391] INFO [RaftManager nodeId=1] Node 3 disconnected. (org.apache.kafka.clients.NetworkClient)

server.properties (node 1):

process.roles=broker,controller
node.id=1
controller.quorum.voters=1@172.31.46.13:9093,2@172.31.35.247:9093,3@172.31.38.35:9093

listeners=CONTROLLER://:9093,SSL://:9094
advertised.listeners=SSL://172.31.46.13:9094
controller.listener.names=CONTROLLER
security.inter.broker.protocol=SSL

ssl.enabled.protocols=TLSv1.3,TLSv1.2,TLSv1.1,TLSv1
ssl.client.auth=required
ssl.keystore.location=/opt/kafka/certs/kafka-01.keystore.jks
ssl.keystore.password=
ssl.key.password=
ssl.truststore.location=/opt/kafka/certs/server.truststore.jks
ssl.truststore.password=

authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
super.users=User:kafka-01;User:kafka-02;User:kafka-03
ssl.principal.mapping.rules=RULE:^CN=(.*?),OU=Test,O=Test.*$/$1/

I did also try with a six node cluster (three controllers and three brokers), and the same thing happens.

Does anyone know if this is normal behaviour? The log type is "INFO", which could indicate it is. I can't really find an explanation for this behavior or what is causing it however. The logs only show the disconnects without any further related log entries.

eazy-b
  • 11
  • 2

0 Answers0