0

We have seen topic creation (which we control via terraform) fail sometimes. During the initial apply, terraform reports a connection error. On replan, it wants to create the topics, but on a second apply it fails saying the topic already exists. When this happens, the zookeeper nodes know about the topic but the kafka brokers do not.

$ zook=zookeeper-N.FQDN:2181
$ broker=kafka-N.FQDN:6667
$ 
$ kafka-topics.sh --describe --zookeeper $zook --topic troubleSomeTopic
Topic: troubleSomeTopic  TopicId: x-ggFaJCRY6THYGvNjA20Q PartitionCount: 32      ReplicationFactor: 4    Configs: compression.type=snappy,retention.ms=86400000,segment.ms=86400000
/// verbose partition details
$ kafka-topics.sh --describe --bootstrap-server $broker --topic troubleSomeTopic
Error while executing topic command : Topic 'troubleSomeTopic' does not exist as expected
/// java backtrace

That example just checks one zookeeper and one broker, but all show the same results.

This cluster has zookeeper version 3.5.9 running on three nodes, and kafka 2.13-2.8.0 running on six. There is a similar question for an earlier version of kafka, but that method does not work any more.

How to remove an inconsistent kafka topic metadata data from kafka_2.10-0.8.1.1

Cupcake Protocol
  • 661
  • 3
  • 10

1 Answers1

0

The fix we have found for this is to delete the topic in the zookeeper, then restart the kafka cluster controller.

$ kafka-topics.sh --delete --zookeeper $zook --topic troubleSomeTopic
$ zookeeper-shell.sh $zook get /controller 2>/dev/null |
           grep brokerid | jq -r .brokerid
10213
$ zookeeper-shell.sh $zook  get /brokers/ids/10213 |
           tail -1 | jq -r .host
kafka-5.FQDN
$ 

That's the broker to restart.

Before the restart, the zookeepers still know about the topic, but have it MarkedForDeletion: true. Restarting the controller broker clears the topic from zookeeper after a minute or two.

Cupcake Protocol
  • 661
  • 3
  • 10