2

Kafka machines are installed as part of hortonworks packages , kafka version is 0.1X

We run the deeg_data applications, consuming data from kafka topics

On last days we saw that our application – deeg_data are failed and we start to find the root cause

On kafka cluster we see the following behavior

/usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667
To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files>
where num_of_file > 0
GC log rotation is turned off
Consumer group ‘deeg_data’ is rebalancing

from kafka side kafka cluster is healthy and all topics are balanced and all kafka brokers are up and signed correctly to zookeeper

After some time ( couple hours ) , we run again the following , but without the errors about - Consumer group ‘deeg_data’ is rebalancing

And we get the following correctly results

/usr/hdp/current/kafka-broker/bin/kafka-consumer-groups.sh --group deeg_data --describe --bootstrap-server kafka1:6667
To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files>
where num_of_file > 0
GC log rotation is turned off
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
deeg_data pot.sdr.proccess 0 6397256247 6403318505 6062258 consumer-1_/10.3.6.237
deeg_data pot.sdr.proccess 1 6397329465 6403390955 6061490 consumer-1_/10.3.6.237
deeg_data pot.sdr.proccess 2 6397314633 6403375153 6060520 consumer-1_/10.3.6.237
deeg_data pot.sdr.proccess 3 6397258695 6403320788 6062093 consumer-1_/10.3.6.237
deeg_data pot.sdr.proccess 4 6397316230 6403378448 6062218 consumer-1_/10.3.6.237
deeg_data pot.sdr.proccess 5 6397325820 6403388053 6062233 consumer-1_/10.3.6.237.
.
.
.

So we want to understand why we get:

Consumer group ‘deeg_data’ is rebalancing

What is the reason for above state , and why we get rebalancing

we also have good post (https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/)

Group rebalancing Consumer group rebalancing is triggered when partitions need to be reassigned among consumers in the consumer group: A new consumer joins the group; an existing consumer leaves the group; an existing consumer changes subscription; or partitions are added to one of the subscribed topics.

Rebalancing is orchestrated by the group coordinator and it involves communication with all consumers in the group. To dive deeper into the consumer group rebalance protocol, see Everything You Always Wanted to Know About Kafka’s Rebalance Protocol But Were Afraid to Ask by Matthias J. Sax from Kafka Summit and The Magical Rebalance Protocol of Apache Kafka by Gwen Shapira.

Regarding consumer client code, some of the partitions assigned to it might be revoked during a rebalance. In the older version of the rebalancing protocol, called eager rebalancing, all partitions assigned to a consumer are revoked, even if they are going to be assigned to the same consumer again. With the newer protocol version, incremental cooperative rebalancing, only partitions that are reassigned to another consumer will be revoked. You can learn more about the new rebalancing protocol in this blog post by Konstantine Karantasis and this blog post by Sophie Blee-Goldman.

Regardless of protocol version, when a partition is about to be revoked, the consumer has to make sure that record processing is finished and the offset is committed for that partition before informing the group coordinator that the partition can be safely reassigned.

With automatic offset commit enabled in the thread per consumer model, you don’t have to worry about group rebalancing. Everything is done by the poll method automatically. However, if you disable automatic offset commit and commit manually, it’s your responsibility to commit offsets before the join group request is sent. You can do this in two ways:

enter image description here

Note - also good post is from you-tube - https://www.youtube.com/watch?v=QaeXDh12EhE

Note - good stack-overflow post - Kafka Consumer Rebalancing takes too long

Note - from ENV side , since our zookeeper servers are installed on VM machines and VM machine are using non ssd disks , and regarding to swap consuming , then I think we need to consider also the post - https://community.cloudera.com/t5/Community-Articles/Zookeeper-Sizing-and-Placement/ta-p/247885

jessica
  • 2,426
  • 24
  • 66
  • Rebalances don't really care about the cluster health. Your consumer threads are dying or timing out. – OneCricketeer Dec 23 '21 at 16:18
  • @OneCricketeer , in case consumer threads are dying or timing out. , what is your the next suggestion ? maybe to try to tune the Kafka client parameters? or something else? – jessica Dec 23 '21 at 16:33
  • @OneCricketeer please see the post that I add to my Question "Group rebalancing Consumer group rebalancing is triggered when partitions need to be reassigned among consumers in the consumer group" , is it means that topic partitions are not balanced to brokers ids? so this could be the reason why consumers are fying? – jessica Dec 23 '21 at 17:14
  • other explain is maybe that - Re-balancing is happening when a new consumer starts to consume messages from this topic ( but not clear how it can be ) – jessica Dec 23 '21 at 17:17
  • Only if new consumers are added to the same group – OneCricketeer Dec 24 '21 at 01:19

1 Answers1

3

The rebalance in Kafka is a protocol and is used by various components (Kafka connect, Kafka streams, Schema registry etc.) for various purposes.

In the most simplest form, a rebalance is triggered whenever there is any change in the metadata.

Now, the word metadata can have many meanings - for example:

  • In the case of a topic, it's metadata could be the topic partitions and/or replicas and where (which broker) they are stored
  • In the case of a consumer group, it could be the number of consumers that are a part of the group and the partitions they are consuming the messages from etc.

The above examples are by no means exhaustive i.e. there is more metadata for topics and consumer groups but I wouldn't go into more details here.

So, if there is any change in:

  • The number of partitions or replicas of a topic such as addition, removal or unavailability
  • The number of consumers in a consumer group such as addition or removal
  • Other similar changes...

A rebalance will be triggered. In the case of consumer group rebalancing, consumer applications need to be robust enough to cater for such scenarios.

So rebalances are a feature. However, in your case it appears that it is happening very frequently so you may need to investigate the logs on your client application and the cluster.

Following are a couple of references that might help:

  1. Rebalance protocol - A very good article on medium on this subject
  2. Consumer rebalancing - Another post on SO focusing on consumer rebalancing
Lalit
  • 1,944
  • 12
  • 20
  • do you think that non balance of partition in kafka can cause Consumer group is rebalancing ? – jessica Dec 23 '21 at 18:30
  • 1
    When you say? non balance of partitions - do you mean data not evenly spread across partitions? If that is what you mean by non-balance then No - this will not have any effect on rebalancing as each consumer will continue consuming from it's dedicated partition and will only change it's partition assignment (if at all) if the rebalance is triggered. – Lalit Dec 23 '21 at 18:32
  • also about the zookeeper health check , dose zookeeper are also chould be part of the problem - Consumer group is re balancing – jessica Dec 23 '21 at 18:33
  • 1
    No. Zookeeper health will not have anything to do with rebalancing. However, Zookeeper does store some metadata - if that changes - only then we will have a rebalance triggered. – Lalit Dec 23 '21 at 18:35
  • what I means is that , for example if we have topic with 100 partitions , and we have 5 brokers then , good balance should be like this , first broker id is manage 20 partition , second broker id is managed 20 partition and so on – jessica Dec 23 '21 at 18:35
  • 1
    Yes. Kafka will take care of that autmatically when you create a topic. So, this will not have any effect on rebalancing – Lalit Dec 23 '21 at 18:36
  • what we saw about zookeeper is that - we have 3 zoo servers on linux , while the swap consuming on that linux is high , and the best practice say that zookeeper not should be wrote the data on disks ( swap ) so this is the reason that I mention the zookeeper – jessica Dec 23 '21 at 18:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240397/discussion-between-jessica-and-lalit). – jessica Dec 23 '21 at 18:38