29

Our cluster runs Kafka 0.11 and has strict restrictions on using consumer groups. We cannot use arbitrary consumer groups so Admin has to create required consumer groups.

We run Kafka Connect HDFS Sinks to read data from topics and write to HDFS. All the topics have only one partition.

I can consider following two patterns when using Consumer Groups in Kafka HDFS Sink.

As shown in the pictures:

Case 1: Each topic has its own Consumer Group enter image description here

Case 2: All the topics have a common Consumer Group enter image description here

I am aware that when a topic has multiple partitions, and if a consumer failed, another consumer in the same consumer group take over that partition.

My question :

Does the same thing happen when multiple topics share the same consumer group? ie: if a Consumer failed(HDFS Sink), will another Consumer(HDFS Sink connector) takeover the work and read from that topic?

Update: Each Kafka HDFS Sink Connector subscribed to only one topic.

Radix
  • 2,527
  • 1
  • 19
  • 43
Ashika Umanga Umagiliya
  • 8,988
  • 28
  • 102
  • 185

4 Answers4

87

I'm surprised that all answers with "yes" are wrong. I just tested it and having the same group.id for consumers for different topic works well and does NOT mean that they share messages, because for Kafka the key is (topic, group) rather than just (group). Here is what I did:

  1. created 2 different topics T1 and T2 with 2 partitions in each topic
  2. created 2 consumers with the same group xxx
  3. assigned consumer C1 to T1, consumer C2 to T2
  4. produced messages to T1 - only consumer C1 assigned to T1 processed them
  5. produced messages to T2 - only consumer C2 assigned to T2 processed them
  6. killed consumer C1 and repeated 4-5 steps. Only consumer C2 processed messages from T2
  7. messages from T1 were not processed

Conclusion: Consumers with the same group name subscribed to different topics will NOT consume messages from other topics, because the key is (topic, group)

borN_free
  • 1,385
  • 11
  • 19
  • 1
    yes ,this is the correct behavior i expected. I think others assumed that my consumers subscribe to both T1 and T2 . – Ashika Umanga Umagiliya Oct 18 '19 at 07:04
  • @AshikaUmangaUmagiliya feel free to accept the correct answer then – borN_free Oct 23 '19 at 07:15
  • 10
    @borN_free i dont think anyone claimed they would see records from topics they dont own. to complete your experiment, try and force a rebalance by adding partitions to one of the topics and you will see ALL consumers (even those that dont care about that topic) stop to rebalance. – radai Oct 24 '19 at 18:05
  • In Step 1, if you create T1 with only ONE partition, T2 still two, what would happen? When starting Kafka, C2 complains about having not enough partition, even though T2 is enough. Have you seen that? – user10375 Apr 13 '20 at 19:33
  • 1
    @borN_free Could you please share a code example how you have done it? – MSIslam Oct 28 '20 at 13:20
  • 2
    @borN_free I think the question is what happens if consumer C1 and C2 each subscribe to both topics. I think in such case if you kill C1, C2 will read from both topics – Victoriia Jan 11 '21 at 16:43
  • I think consuming different topics with same consumer group id works just fine, but over a period of time it is causing **Broker: Group rebalance in progress** error. – VR1256 May 11 '22 at 14:59
2

Absolutely yes. The kafka consumers should monitor both topics and then, kafka will assign the partitions (per topic) to the current active members of the consumer group.

Regardless of having one or multiple partitions on every single topic, the consumers will take charge of monitoring the partitions per topic whenever a consumer failure happens in the same group. When a failure happens, the Kafka will always trigger the re-balancing process in order to distribute the partitions to the remaining active consumers of the group and as a consequence, the work will continue running on that topics.

Alexadreison
  • 135
  • 6
  • 2
    my question is not regarding rebalamcing partitions..its about whether rebalacing happens for different topics in a same consumer group – Ashika Umanga Umagiliya Sep 02 '19 at 10:37
  • the answer is yes. if the consumers were registered/subscribed on both topics then they will start getting messages from the other topic when one consumer fails. – Alexadreison Sep 02 '19 at 10:58
  • @Alexadreison If the hdfs sinks are different then whats the point of writing the data, read from different topics into different sinks. – semicolon May 06 '22 at 06:27
1

yes, as long as both consumers subscribe() the the same set of topics (topicA and topicB) the partitions of all topics will be distributed across all consumers.

in your case this would mean that if one of the consumers fails, both topics will be assigned to the surviving consumer.

radai
  • 23,949
  • 10
  • 71
  • 115
0

The question asked is in the event of consumer fails in a consumer group, will the consumers available in the same group pick up the subscribed topics and starts processing again or not?.

But the accepted answer has the scenario where the topics are assigned to consumers, but if its auto assignment(i.e., subscribe) then the consumers that are idle in the group should pick the job of failed consumer and starts reading from the last committed offset. If its not then its breaking the consumer group parallelism architecture.

just look at this answer. Kafka consumer for multiple topic

Ramya B
  • 65
  • 6