0

Wondering how to recover from a unique situation where zookeeper seems to have the topic (T_60036) metadata, but broker doesn't have the corresponding log file causing producers to fail with exception

kafka.common.FailedToSendMessageException

Below is what we noticed:

In zookeeper both /brokers/topics/T_60036 and /config/topics/T_60036 paths exist.

kafka@kafka-3:~$ /opt/kafka/kafka_2.10-0.8.1.1/bin/zookeeper-shell.sh
localhost:2181 get /brokers/topics/T_60036/partitions/0/state
Connecting to localhost:2181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
{"controller_epoch":6,"leader":1,"version":1,"leader_epoch":0,"isr":[1,2]}
cZxid = 0x80013308c
ctime = Wed Jun 06 04:55:37 UTC 2018
mZxid = 0x80013308c
mtime = Wed Jun 06 04:55:37 UTC 2018
pZxid = 0x80013308c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 74
numChildren = 0

kafka@kafka-3:~$ /opt/kafka/kafka_2.10-0.8.1.1/bin/zookeeper-shell.sh
localhost:2181 get /config/topics/T_60036
Connecting to localhost:2181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
{"version":1,"config":{}}
cZxid = 0x800132992
ctime = Wed Jun 06 04:55:13 UTC 2018
mZxid = 0x800132992
mtime = Wed Jun 06 04:55:13 UTC 2018
pZxid = 0x800132992
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 25
numChildren = 0

But there are no log files for this topic:

kafka@kafka-3:~$ ls -l /var/kafka/topics/T_60036*
ls: cannot access /var/kafka/topics/T_60036*: No such file or directory

I did read the second comment for topic deletion here but I am afraid it may destabilize the entire cluster. My question is will it be safe to delete the orphan zookeeper entries ("/config/topics/T_60036", "/brokers/topics/T_60036") from zookeeper without restarting or jeopardizing the cluster.

Here is our configuration

Version: kafka_2.10-0.8.1.1
Cluster Configuration: 4 kafka brokers + 4 zookeeper
Topic Partiton: 1
Topic Replicas: 2
jsh
  • 261
  • 1
  • 12

2 Answers2

0

This is what seems to have worked without bringing down the cluster:

First delete the corrupted topic using a hidden feature of 0.8.1.1

kafka@kafka-3:~$ /opt/kafka/kafka_2.10-0.8.1.1/bin/kafka-run-class.sh kafka.admin.DeleteTopicCommand --zookeeper localhost:2181 --topic T_60036

Re-create the topic

kafka@kafka-3:~$/opt/kafka/kafka_2.10-0.8.1.1/bin/kafka-topics.sh --create --topic T_60036 --zookeeper localhost:2181 --partitions 1 --replication-factor 2
jsh
  • 261
  • 1
  • 12
0

Want to let folks know that if you try the proposed solution in a newer cluster version (tested in version 2.8), using hidden feature 0.8.1.1 kafka-run-class.sh kafka.admin.DeleteTopicCommand, this will led to an inconsistent state in zookeeper topic configuration.

So i would recommend not to do it.

Maybe it worked for previous versions, but not for 2.8

victorgp
  • 882
  • 1
  • 13
  • 22