2

I'm having Kafka Consumer group of applications (10 instances) written in Java which uses Spring Cloud Stream. Consumer application is deployed in AWS Kubernetes cluster. Consumer config is using the default values (for ex., max poll interval = 5 mins). Application seems to be working without any issues till if one of the pod gets killed/evicted by node for whatever reason. Once a pod gets evicted, new pod gets added without any issues, but consumer group gets disturbed and goes into infinite loop of rebalancing with below error,

Triggering the followup rebalance scheduled for 0 ms

Request joining group due to: rebalance enforced by user.

SyncGroup failed: The group began another rebalance. Need to re-join the group. Sent generation was Generation.

My expectation was if any pod killed/evicted, new pod gets added to consumer group and a rebalance occurs once and after rebalance everything should work normal. But that's not happening here. Any help is much appreciated.

Update: Above mentioned error occurs every 5 mins (which is the same as max poll interval).

I'm using Kafka - 3.0.1, Spring kafka - 2.8.4 and Spring Cloud Stream - 3.2.1

Please let me know for more information

Jagadeesh
  • 23
  • 4
  • See if `group.initial.rebalance.delay.ms` helps you: https://stackoverflow.com/questions/56561378/how-to-introduce-delay-in-rebalancing-in-case-of-kafka-consumer-group – Artem Bilan May 20 '22 at 13:30

1 Answers1

0

Check if your consumer poll timeout is greater than the group.initial.rebalance.delay.ms.

For eg, Lets consider the group.initial.rebalance.delay.ms as 15 sec. Consumer poll timeout is 1 sec. The consumer group rebalance will wait 15 seconds for the new member to join. As the poll timeout is 1 sec there won't be any consumer group members at the end of 15 sec. So the consumer group will keep on rebalancing.

Jebus
  • 51
  • 7
  • I've one more query on the same. When I checked the value of "group.initial.rebalance.delay.ms", it is 3 sec. I've "heartbeat.interval.ms" as 3 sec. Can you please let me know the property that I need to change to increase consumer poll timeout? I'm using Kafka Streams. – Jagadeesh Jun 08 '22 at 04:31