1

So I've been using Kafka 3.1.0 in production environment. One of the VMs had to be live migrated, but due to some issues live migration failed and node has been forcefully migrated, involving full VM restart.

After that VM booted up, Kafka stopped working "completely" - clients were not able to connect and produce/consume anything. JMX metrics were still showing up, but that node showed many partitions as "Offline partitions".

Looking into the logs, that particular node kept showing A LOT of INCONSISTENT_TOPIC_ID errors. Example:

WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-2. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)
WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition my-topic-3. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)

However, if you take a look at other Kafka brokers, they were showing a bit different errors (I don't have a logs sample) - UNKNOWN_TOPIC_ID...

Another interesting issue - I've described Kafka topic and this is what I've got:

Topic: my-topic        TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 4       ReplicationFactor: 4    Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
        Topic: my-topic        Partition: 0    Leader: 2       Replicas: 5,2,3,0       Isr: 2
        Topic: my-topic        Partition: 1    Leader: 0       Replicas: 0,1,2,3       Isr: 0
        Topic: my-topic        Partition: 2    Leader: 2       Replicas: 1,2,3,4       Isr: 2
        Topic: my-topic        Partition: 3    Leader: 2       Replicas: 2,3,4,5       Isr: 2

Why does it show only 1 ISR when there should be 4 per partition? Why did it happen in the first place?

I've added additional partition and this is what it shows now:

Topic: my-topic        TopicId: XXXXXXXXXXXXXXXXXXXXXX PartitionCount: 5       ReplicationFactor: 4    Configs: segment.bytes=214748364,unclean.leader.election.enable=true,retention.bytes=214748364
        Topic: my-topic        Partition: 0    Leader: 2       Replicas: 5,2,3,0       Isr: 2
        Topic: my-topic        Partition: 1    Leader: 0       Replicas: 0,1,2,3       Isr: 0
        Topic: my-topic        Partition: 2    Leader: 2       Replicas: 1,2,3,4       Isr: 2
        Topic: my-topic        Partition: 3    Leader: 2       Replicas: 2,3,4,5       Isr: 2
        Topic: my-topic        Partition: 4    Leader: 3       Replicas: 3,4,5,0       Isr: 3,4,5,0

I know there is kafka-reassign-partitions.sh script and it fixed similar issue in preproduction environment, but I am more interested why did it happen in the first place?

Could this be related? I've set the parameter replica.lag.time.max.ms=5000 (over default 500) and even after restarting all nodes it didn't help.

Erikas
  • 1,006
  • 11
  • 22

2 Answers2

1

This normally happens when the topic ID in the session does not match the topic ID in the log. To fix this issue you will have to make sure that the topic ID remains consistent across your cluster.

If you are using zookeeper, run this command in zkCli.sh, on one of your nodes that are still in sync and note the topic_id -

[zk: localhost:2181(CONNECTED) 10] get /brokers/topics/my-topic
{"partitions":{"0":[5,1,2],"1":[5,1,2],"2":[5,1,2],"3":[5,1,2],"4":
[5,1,2],"5":[5,1,2],"6":[5,1,2],"7":[5,1,2],"8":[5,1,2],"9":
[5,1,2]},"topic_id":"s3zoLdMp-T3CIotKlkBpMgL","adding_replicas":
{},"removing_replicas":{},"version":3}

Next, for each node, check the file partition.metadata for all the partitions of the topic my-topic. This file can be found in logs.dir (see server.properties).

For example, if logs.dir is set to /media/kafka-data, you can find it at -

/media/kafka-data/my-topic-1/partition.meta for partition 1.

/media/kafka-data/my-topic-2/partition.meta for partition 2, and so on.

The contents of the file may look like this, (you see it matches the topic_id that zookeeper has) -

version: 0
topic_id: s3zoLdMp-T3CIotKlkBpMgL

You'll need to make sure that the value of topic_id in all the parition.metadata files across your cluster for my-topic is the same. If you come across a different topic ID in any of the partitions you can edit it with any text editor (or write a script to do this for you).

Once done, you may need to restart your brokers one at a time for this change to take effect.

Johnny
  • 11
  • 2
0

I will try to answer why the topic Id in Zookeeper may differ from topic Id stored in partition.metadata:

In certain situations, it is possible that the TopicId stored locally on a broker for a topic differs from the topicId stored for that topic on Zk. Currently, such situation arises when users use a <2.8 client to alterPartitions for a topic on a >=2.8 (including latest 3.4) brokers AND they use --zookeeper flag from the client. Note that --zookeeper has been marked deprecated for a long time and has been replaced by --bootstrap-server which doesn't face this problem.

The result of topic Id discrepancy leads to availability loss for the topic until user performs the mitigation steps listed in KAFKA-14190.

The exact sequence of steps are:

  1. User uses pre 2.8 client to create a new topic in zookeeper directly
  2. No TopicId is generated in Zookeeper
  3. KafkaController listens to the ZNode, and a TopicChange event is created, During handling on this event, controller notices that there is no TopicId, it generated a new one and updates Zk.
  4. At this stage, Zk has a TopicId.
  5. User uses pre 2.8 client to increase the number of partitions for this topic
  6. The client will replace/overwrite the entire existing Znode with new placement information. This will delete the existing TopicId in Zk (that was created by controller in step 3).
  7. Next time KafkaController interacts with this ZNode, it will generate a new TopicId.
  8. Note that we now have two different TopicIds for this topic name.
  9. Broker may have a different topicId (older one) in metadata file and will complain about the mismatch when they encounter a new TopicId.
Divij
  • 878
  • 2
  • 9
  • 18