3

I have a multi-partition topic that is consumed by multiple consumers(same group). My goal is to maximize the consuming processing, i.e. any consumer can consume messages from any partitions.

I know that it looks impossible as only one consumer can consume from a partition.

Is it possible to use the REST Proxy to achieve this? For example, polling all the Proxy consumer instances.

Thanks.

zwush
  • 75
  • 2
  • 9
  • Any consumer can already consume from any partition... You're bounded by the number of partitions in the topic for which you run multiple applications to maximize consumption. What issues are you actually trying to solve? – OneCricketeer Jan 17 '20 at 21:11
  • Thanks for your reply. I am trying avoid the situation that some consumers are idle while there are still messages in the topic. Is that possible? – zwush Jan 17 '20 at 22:15
  • Yes. If you run more consumer threads than partitions, then those extras are idle – OneCricketeer Jan 18 '20 at 02:06
  • Actually I don't want to have any idle consumers while there are pending messages in the topic. It is kind of like a pool of new messages consumable by all consumers. – zwush Jan 18 '20 at 07:13
  • Okay. Then start as many threads or separate applications as there are partitions, and no more. I'm still not sure I understand your problem. – OneCricketeer Jan 18 '20 at 07:34
  • Thanks. But this is not what I am looking for. Suppose there is a fixed number of consumers. Is there a way to let them compete for the new messages from all partitions? Order is not important in my case. – zwush Jan 18 '20 at 07:57
  • That depends. Does each consumer have a unique `group.id`? Are you using `subscribe` or `assign`? How many partitions are there, really? – OneCricketeer Jan 18 '20 at 08:00
  • Consumers are within the same group. There are 10 consumers and 10 partitions. I am using subscribe. – zwush Jan 18 '20 at 08:17
  • Then you are guaranteed one consumer per partition and the maximum processing throughput you can get – OneCricketeer Jan 18 '20 at 09:21
  • In my case some consumers are very slow compared to the others. The messages are evenly distributed to all partitions. There will be a moment that some fast consumers are starving while the slower ones have much lag to catch up. Then the throughput is lower. Is there a way to guarantee that either the throughput is maximized or zero if there is zero lag for the topic. – zwush Jan 18 '20 at 09:43
  • There is not a guarantee. There is some other bottleneck in your system, but it seems to be external to Kafka. Are those slow consumers running on their own hardware? Different machines than the fast ones? – OneCricketeer Jan 18 '20 at 09:53
  • The consumer's performance is dynamically tuned by configure. Most of the time they are very different. It is by design unfortunately.. – zwush Jan 18 '20 at 10:04
  • Dynamic? So you're closing consumer objects, changing their properties, then reopening them? Seems error prone if you're not careful about offset handling – OneCricketeer Jan 18 '20 at 16:32
  • The consumer's performance is dynamically changed on the fly. So they will always listen to the same partitions. – zwush Jan 20 '20 at 09:50
  • You can do that using the assign method. There's no need to change anything at runtime – OneCricketeer Jan 20 '20 at 13:29

2 Answers2

1

Kafka consumers, by default, are configured to consume from as many partitions as possible. If you have multiple simultaneous consumers on the same topic, using the same consumer group ID, Kafka will automatically distribute the volume across all of those consumers. This is by design, so you can scale consumption quickly by adding more consumers.

You can, optionally, instruct the kafka consumer to only consumer from specific partitions, even including down to one, but you'd have to do that explicitly.

mjuarez
  • 16,372
  • 11
  • 56
  • 73
0

The best way to maximize the consuming processing is to have one consumer (same group) reading from each partition.

As improvement actions you may also review:

  • The number of partitions: you could increase them to be able to add more consumers and increase throughput
  • How messages are balanced across partitions, a bad key selection can lead to messages all coming to same partition

Also as a reminder, it is allowed only one consumer by partition and consumer group to avoid concurrency issues. What would happen if 2 consumers commit different offsets? -> You would end up reading messages twice or skipping some of them!

Alberto Martin
  • 556
  • 4
  • 8