14

Let's say that I have 10 partitions for a given topic in Kafka. What would my options be to automatically load balance these 10 partitions between consumers?

I have read this post https://stackoverflow.com/a/28580363/317384 but I'm not sure it covers what I'm looking for, or maybe I'm just not getting it.

If I spin up a worker with one consumer for each partition, all work would be consumed by that worker.

But what happens if I spin up another instance of the same worker elsewhere? Will the client libraries/Kafka somehow detect this and re-balance the load between the two workers so that some of the active consumers on worker1 are now idle and the same consumers on worker2 becomes active?

I would like to be able to add and remove workers on demand, and spread the load across those, is that possible?

e.g. from this: enter image description here

to this: enter image description here

Community
  • 1
  • 1
Roger Johansson
  • 22,764
  • 18
  • 97
  • 193

1 Answers1

20

Kafka consumers are part of consumer groups. A group has one or more consumers in it. Each partition gets assigned to one consumer. And partitions are how Kafka scales out. If you have more consumers than partitions, then some of your consumers will be idle. If you have more partitions than consumers, more than one partition may get assigned to a single consumer.

When a new consumer joins, a rebalance occurs, and the new consumer is assigned some partitions previously assigned to other consumers. In your case, if there were 10 partitions all being consumed by one consumer, and another consumer joins, there'll be a rebalance, and afterwards, there'll be (typically) five partitions per consumer.

It's worth noting that during a rebalance, the consumer group "pauses". A similar thing happens when consumers gracefully leave, or the leader detects that a consumer has left.

ashic
  • 6,367
  • 5
  • 33
  • 54
  • Is the consumer group support a client library thing separate from Kafka itself? I'm not fully getting what they say here https://github.com/Shopify/sarama/issues/410 (the Go sdk that Im using) – Roger Johansson Oct 30 '16 at 09:33
  • 1
    Cool. Consumer groups are pretty fundamental to Kafka. The clients are also quite smart - they take quite a bit off of the server by being so. I don't know much about the go client, but I see no reason for it not following the other ones. – ashic Oct 30 '16 at 10:13
  • @ashic "If you have more consumers than partitions, then some of your consumers will be idle." Quoting your response, would like to know which consumer is idle and how it is decided? Who will decide which consumer will get the next event? – TechEnthusiast Sep 13 '17 at 13:38
  • Topics are divided into partitions, and consumers get partitions assigned to them (by the group coordinator, see https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/ ). Say there's 2 partitions and 3 consumers. 2 consumers having 2 partitions to themselves means the 3rd consumer is starved of any messages. – ashic Sep 13 '17 at 16:48