1

Recently our (Kafka 1.1.1) brokers went down and our Kafka-stream application stopped working. So we stopped the application manually to stop the alerts.

After the Kafka came live again, We started our stream application but it didn't read any messages from the topic. We found out after reading the logs that the Group Coordinator keeps discovering and then becoming unavailable again in an endless loop.

Below are the logs.

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-10-consumer, groupId=dummy-consumer-id] Group coordinator test-kafka01.com:9092 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-1-consumer, groupId=dummy-consumer-id] Discovered group coordinator test-kafka01.com:9092 (id: 2147483644 rack: null)
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-1-consumer, groupId=dummy-consumer-id] (Re-)joining group
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-3-consumer, groupId=dummy-consumer-id] Discovered group coordinator test-kafka01.com:9092 (id: 2147483644 rack: null)
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-3-consumer, groupId=dummy-consumer-id] (Re-)joining group
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-2-consumer, groupId=dummy-consumer-id] Discovered group coordinator test-kafka01.com:9092 (id: 2147483644 rack: null)
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-2-consumer, groupId=dummy-consumer-id] (Re-)joining group
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

[Consumer clientId=dummy-consumer-id-6b3ad573-5b6a-4e89-82c1-1705e3662d55-StreamThread-1-consumer, groupId=dummy-consumer-id] Group coordinator test-kafka01.com:9092 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
loggerName":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator

After doing much analysis we decided to change the stream app id and start the application, Everything worked fine but after some time the same issue happened again.

Please help me to debug this issue. We can't afford to change the stream app id each time and do let me know if any detail is required.

  • Check this, if it helps https://stackoverflow.com/questions/40316862/the-group-coordinator-is-not-available-kafka – Ranga Vure May 13 '20 at 13:35
  • @RangaVure Thanks for the link, I went through it, In the given link the issue was that the user was running Kafka on a single broker. I am using a cluster environment to run Kafka with 3 brokers. – Himanshu Tomar May 13 '20 at 14:07

1 Answers1

0

This is likely due to your Kafka Streams App having the wrong bootstrap.servers configuration.

Ensure the entire array described as the bootstrap servers in the App config can be resolved. If a few of them are valid and others are not, the KStreams app would have trouble balancing partitions.

Based on what you describe, if you used IPs instead of DNS domains to resolve your Kafka brokers, they may have changed after they went down. Ensure they are resolvable DNS names (not IPs) and that they all get resolved appropriately - you can ping them to make sure that is the case.

  • There is indeed a DNS server in place, We are using the domain url instead of IP for broker configuration. More over when i change the stream app id everything works fine. But the problem is everytime changing the id to resolve this issue is not a good solution, It is just a workaround for sometime then application will eventually stop working. – Himanshu Tomar May 17 '20 at 04:41