35

If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?

In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?

What if you have multiple consumers on a given topic#partition? I guess the consumer has to somehow keep track of what messages it has already processed in case of duplicates?

raksja
  • 3,969
  • 5
  • 38
  • 44
cool breeze
  • 4,461
  • 5
  • 38
  • 67

2 Answers2

62

In fact, each consumer belongs to a consumer group. When Kafka cluster sends data to a consumer group, all records of a partition will be sent to a single consumer in the group.

If there're more paritions than consumers in a group, some consumers will consume data from more than one partition. If there're more consumers in a group than paritions, some consumers will get no data. If you add new consumer instances to the group, they will take over some partitons from old members. If you remove a consumer from the group (or the consumer dies), its partition will be reassigned to other member.

Now let's take a look at your questions:

If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?

NO. Some consumers in the same consumer group will consume data from more than one partition.

In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?

Kafka will take care of it. If new consumers join the group, or old consumers dies, Kafka will do reblance.

What if you have multiple consumers on a given topic#partition?

You CANNOT have multiple consumers (in a consumer group) to consume data from a single parition. However, if there're more than one consumer group, the same partition can be consumed by one (and only one) consumer in each consumer group.

for_stack
  • 21,012
  • 4
  • 35
  • 48
1

1) No that means you will one consumer handling more than one consumer. 2) Kafka never assigns same partition to more than one consumer because that will violate order guarantee within a partition. 3) You could implement ConsumerRebalanceListener, in your client code that gets called whenever partitions are assigned or revoked from consumer.

You might want to take a look at this article specically "Assigning partitions to consumers" part. In that i have a sample where you create topic with 3 partitions and then a consumer with ConsumerRebalanceListener telling you which consumer is handling which partition. Now you could play around with it by starting 1 or more consumers and see what happens. The sample code is in github

http://www.javaworld.com/article/3066873/big-data/big-data-messaging-with-kafka-part-2.html

Sunil Patil
  • 619
  • 4
  • 8