Kafka partitions out of sync on certain nodes

Question

I'm running a Kafka cluster on 3 EC2 instances. Each instance runs kafka (0.11.0.1) and zookeeper (3.4). My topics are configured so that each has 20 partitions and ReplicationFactor of 3.

Today I noticed that some partitions refuse to sync to all three nodes. Here's an example:

bin/kafka-topics.sh --zookeeper "10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181" --describe --topic prod-decline
Topic:prod-decline    PartitionCount:20    ReplicationFactor:3    Configs:
    Topic: prod-decline    Partition: 0    Leader: 2    Replicas: 1,2,0    Isr: 2
    Topic: prod-decline    Partition: 1    Leader: 2    Replicas: 2,0,1    Isr: 2
    Topic: prod-decline    Partition: 2    Leader: 0    Replicas: 0,1,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 3    Leader: 1    Replicas: 1,0,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 4    Leader: 2    Replicas: 2,1,0    Isr: 2
    Topic: prod-decline    Partition: 5    Leader: 2    Replicas: 0,2,1    Isr: 2
    Topic: prod-decline    Partition: 6    Leader: 2    Replicas: 1,2,0    Isr: 2
    Topic: prod-decline    Partition: 7    Leader: 2    Replicas: 2,0,1    Isr: 2
    Topic: prod-decline    Partition: 8    Leader: 0    Replicas: 0,1,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 9    Leader: 1    Replicas: 1,0,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 10    Leader: 2    Replicas: 2,1,0    Isr: 2
    Topic: prod-decline    Partition: 11    Leader: 2    Replicas: 0,2,1    Isr: 2
    Topic: prod-decline    Partition: 12    Leader: 2    Replicas: 1,2,0    Isr: 2
    Topic: prod-decline    Partition: 13    Leader: 2    Replicas: 2,0,1    Isr: 2
    Topic: prod-decline    Partition: 14    Leader: 0    Replicas: 0,1,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 15    Leader: 1    Replicas: 1,0,2    Isr: 2,0,1
    Topic: prod-decline    Partition: 16    Leader: 2    Replicas: 2,1,0    Isr: 2
    Topic: prod-decline    Partition: 17    Leader: 2    Replicas: 0,2,1    Isr: 2
    Topic: prod-decline    Partition: 18    Leader: 2    Replicas: 1,2,0    Isr: 2
    Topic: prod-decline    Partition: 19    Leader: 2    Replicas: 2,0,1    Isr: 2

Only node 2 has all the data in-sync. I've tried restarting brokers 0 and 1 but it didn't improve the situation - it made it even worse. I'm tempted to restart node 2 but I'm assuming it will lead to downtime or cluster failure so I'd like to avoid it if possible.

I'm not seeing any obvious errors in logs so I'm having a hard time figuring out how to debug the situation. Any tips would be greatly appreciated.

Thanks!

EDIT: Some additional info ... If I check the metrics on node 2 (the one with full data), it does realize that some partitions are not correctly replicated.:

$>get -d kafka.server -b kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions *
#mbean = kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions:
Value = 930;

Nodes 0 and 1 don't. They seem to think everything is fine:

$>get -d kafka.server -b kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions *
#mbean = kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions:
Value = 0;

Is this expected behaviour?

score 2 · Answer 1 · answered Jun 27 '18 at 13:07

2

Try increasing replica.lag.time.max.ms.

Explanation goes like this:

If a replica fails to send a fetch request for longer than replica.lag.time.max.ms, it is considered dead and is removed from the ISR.

If a replica starts lagging behind the leader for longer than replica.lag.time.max.ms, then it is considered too slow and is removed from the ISR. So even if there is a spike in traffic and large batches of messages are written on the leader, unless the replica consistently remains behind the leader for replica.lag.time.max.ms, it will not shuffle in and out of the ISR.

answered Jun 27 '18 at 13:07

mukesh210

2,792
2
19
41

Thanks for the answer and explanation @Mukesh prajapati! Is it enough to apply these settings on replicas that are out of sync and not on the leader? – Akarot Jun 27 '18 at 13:23
This parameter need to be mentioned for Broker Config. Refer [Kafka documentation](https://kafka.apache.org/documentation/). I think you need to apply this for all brokers. – mukesh210 Jun 27 '18 at 13:36
I see. But if I restart the only broker still in-sync, the cluster will experience downtime, right? Can this be avoided in any way? Can I set the parameter live? Also, what kind of value would you recommend? Thanks! – Akarot Jun 27 '18 at 13:41
Unfortunately this is Broker Config and not per-topic config. So, broker has to be restarted for the change to take place. – mukesh210 Jun 27 '18 at 13:59
Would would a sane value but still big enough be? I can see the default is 10 seconds which seems plenty already. – Akarot Jun 27 '18 at 15:22
I've just tried it at 30000 (30s) and it seems to have indeed solved my issue (it used to be at 10s too) – dkurzaj Nov 21 '18 at 16:28

Kafka partitions out of sync on certain nodes

1 Answers1

Linked