The consumer rebalance happens whenever there is a change in the metadata information of a consumer group.
Adding more consumers (scaling in your words) in a group is one such change and triggers a rebalance. During this change, each consumer will be re-assigned partitions and therefore will not know which offsets to commit until the re-assignment is complete. Now, the StickyAssignor
does try and ensure that the previous assignment gets preserved as much as possible but the rebalance will still be triggered and even distribution of partitions will take precedence over retaining previous assignment. (Reference - Kafka Documentation)
Rest, the exception's message is self-explanatory that while the rebalance is happening some of the operations are prohibited.
How to avoid such situations?
This is a tricky one because Kafka needs rebalancing to be able to work effectively. There are a few practices you could use to avoid unnecessary impact:
- Increase the polling time -
max.poll.interval.ms
- so the possibility of experiencing these exceptions is reduced.
- Decrease the number of poll records -
max.poll.records
or max.partition.fetch.bytes
- Try and utilise the latest version(s) of Kafka (or upgrade if you're using an old one) as many of the latest upgrades so far have made improvements to the rebalance protocol
- Use Static membership protocol to reduce rebalances
- Might consider configuring
group.initial.rebalance.delay.ms
for empty consumer groups (either for the first time deployment or destroyin everything and redeploying again)
These techniques can only help you reduce the unnecessary behaviour or exception but will NOT prevent rebalance completely.