9

While trying to get a deep understanding of the Kafka distribution model, one sentence here from StackOverflow got me buzzing, and I can't get a confirmation nor deny.

So, the more subscriber groups you have, the lower the performance is, as kafka needs to replicate the messages to all those groups and guarantee the total order.

As far as I understood from the Kafka docs, multiple consumer groups act similarly to singular consumers. There is no replicating done within the brokers, since each consumer has it's own offset for a certain partition. The number of groups should, then, not put any significant overhead, all of the data is on one place, only the offset is different. Is that correct?

If this is correct, then there is no way of actually introducing multiple disjoint consumers without impacting throughput, since all consumers always query all of the partitions, and some kind of copying is introduced. Note that this is not related to the number of consumer threads, threads only improve consumer performance, they don't interfere with broker operations as far as I conclude.

Community
  • 1
  • 1
Aleksandar Stojadinovic
  • 4,851
  • 1
  • 34
  • 56

2 Answers2

7

I've found an answer myself, it's located within the new consumer API docs for Kafka 0.9 and after:

Conceptually you can think of a consumer group as being a single logical subscriber that happens to be made up of multiple processes. As a multi-subscriber system, Kafka naturally supports having any number of consumer groups for a given topic without duplicating data (additional consumers are actually quite cheap).

Bottom line: no, multiple consumer groups do not decrease performance, at least not significantly.

Community
  • 1
  • 1
Aleksandar Stojadinovic
  • 4,851
  • 1
  • 34
  • 56
1

It does not effect kafka process's performance, but since 2 or more consumer groups means, 2 or more times more read from kafka servers, it effects network utilization in outgoing traffic if you have lots of consumer groups. Besides that data is read from mostly memory and does not effect performance, because ram is way faster then network communication.

halil
  • 1,789
  • 15
  • 18