We have seen topic creation (which we control via terraform) fail sometimes. During the initial apply, terraform reports a connection error. On replan, it wants to create the topics, but on a second apply it fails saying the topic already exists. When this happens, the zookeeper nodes know about the topic but the kafka brokers do not.
$ zook=zookeeper-N.FQDN:2181
$ broker=kafka-N.FQDN:6667
$
$ kafka-topics.sh --describe --zookeeper $zook --topic troubleSomeTopic
Topic: troubleSomeTopic TopicId: x-ggFaJCRY6THYGvNjA20Q PartitionCount: 32 ReplicationFactor: 4 Configs: compression.type=snappy,retention.ms=86400000,segment.ms=86400000
/// verbose partition details
$ kafka-topics.sh --describe --bootstrap-server $broker --topic troubleSomeTopic
Error while executing topic command : Topic 'troubleSomeTopic' does not exist as expected
/// java backtrace
That example just checks one zookeeper and one broker, but all show the same results.
This cluster has zookeeper version 3.5.9 running on three nodes, and kafka 2.13-2.8.0 running on six. There is a similar question for an earlier version of kafka, but that method does not work any more.
How to remove an inconsistent kafka topic metadata data from kafka_2.10-0.8.1.1