0

I have 22 topics and ordering within a topic is important to me. I do not have any partitions.
Basically I have 11 tenants and I need two topics per tenant.
I am confused about whether to have a single consumer group for all 22 topics or have 22 consumer groups?
The load is not much and the consumption is not real-time, it is an offline process, so a lag of a few millis won't hurt.

I am confused about the following points:
1. If I have one consumer group with one consumer running on a single machine (JVM - Spring Boot Application), will the consumer work with all topics using a single thread or will there be separate thread per topic? If it is a single thread, the thread may get overloaded. If there are multiple threads, I will be able to achieve parallelism(utilize all the cores) without spinning another machine.
2. If I have one consumer group listening to all topics with multiple consumers running on multiple machines (Multiple JVMs - Spring Boot Application), will the Zookeeper distribute the load from different topics to different machines? I understand that messages from one topic will always go to a single machine.

For eg: If there are 2 consumers (one per machine), a single consumer group listening to all the 22 topics, and if the 22 topics produce messages simultaneously, will they be distributed among the 2 machines maybe something like messages from topic 1-11 goes to machine one and from topic 12-22 goes to machine two? I am just interested in load distribution.

Does it work this way (assuming equal load from all topics)?
2 machines -> messages from approx 11 topics per machine
4 machines -> messages from approx 5 topics per machine and so on.

rohanagarwal
  • 771
  • 9
  • 30

2 Answers2

2

First of all to clarify the concepts:

  • Topic is just a logical unit.
  • Messages are ordered only in partitions.
  • "I do not have any partitions." is not possible. A topic must have at least one partition.
  • Consumer group is used just for horizontal scalability. If you have 5 partitions in your topic and 5 consumers within the same consumer group. Then Kafka assigns each partition to a consumer and consume process works in parallel.

Answers to your questions:

  1. If you have one consumer then there will be one thread (Kafka consumer is not thread safe), if you need paralellism you need to have more than one partition in topic and same number of consumers in the same consumer group. A consumer can subscribe multiple topics.
  2. There is no use of Zookeeper in consumer side. (take a look at this) But Kafka distribute partitions to consumers evenly. Fair load distribution of partitions to consumers is guaranteed by Kafka in default.

**Maybe this video can be helpful to understand some core concepts better.

H.Ç.T
  • 3,335
  • 1
  • 18
  • 37
1

will the consumer work with all topics using a single thread or will there be separate thread per topic?

The answer is using a single thread because the KafkaConsumer documentation says:

The Kafka consumer is NOT thread-safe. All network I/O happens in the thread of the application making the call. It is the responsibility of the user to ensure that multi-threaded access is properly synchronized. Un-synchronized access will result in ConcurrentModificationException.


If I have one consumer group listening to all topics with multiple consumers running on multiple machines ... will the Zookeeper distribute the load from different topics to different machines?

Yes, even though, it's not Zookeeper the component responsible for this.

Just a note: Kafka doesn't know anything about machines, it knows about consumer groups and consumers.


Now, let's answer the main question.

I am confused about whether to have a single consumer group for all 22 topics or have 22 consumer groups?

Since you have only one partition per topic, having 22 consumers with the same group.id or having 22 consumers each subscribed to only one topic is the same thing because:

each partition is assigned to exactly one consumer in the group.

pierDipi
  • 1,388
  • 1
  • 11
  • 20