114

We are planning to write a Kafka consumer(java) which reads Kafka queue to perform an action which is in the message.

As the consumers run independently, will the message is processed by only one consumer at a time? Else all the consumers process the same message as they have their own offset in the partition.

Please help me understand.

Karan Khanna
  • 1,947
  • 3
  • 21
  • 49
shiv455
  • 7,384
  • 19
  • 54
  • 93
  • 1
    looks like kafka doesn't have queues. it has only topics – gstackoverflow Sep 29 '17 at 12:48
  • 4
    All kafka topics are ordered sets - in other words, they are queues. – Rodney P. Barbati Mar 30 '18 at 18:13
  • 8
    Kafka `topics` are not queues, because once a message is consumed from a `topic`, it stays there(unless its lifetime has expired) and the `offset` moves to the next, whereas for a queue, once a message is consumed, the message is removed from that queue. Ordered sets is also by `partitions` only. – jumping_monkey Jun 05 '20 at 14:16

3 Answers3

215

It depends on Group ID. Suppose you have a topic with 12 partitions. If you have 2 Kafka consumers with the same Group Id, they will both read 6 partitions, meaning they will read different set of partitions = different set of messages. If you have 4 Kafka consumers with the same Group Id, each of them will all read three different partitions etc.

But when you set different Group Id, the situation changes. If you have two Kafka consumers with different Group Id they will read all 12 partitions without any interference between each other. Meaning both consumers will read the exact same set of messages independently. If you have four Kafka consumers with different Group Id they will all read all partitions etc.

spy
  • 3,199
  • 1
  • 18
  • 26
Lukáš Havrlant
  • 4,134
  • 2
  • 13
  • 18
  • actually i would like to have only 3 consumers..(same code) running as daemon service on linux machines in AWS...to poll messages in the queue..so you mean i need to assign same groupId to all the 3 so that only one consumer process the message at a time.....and how will the other consumers know if the message is processed succefully so that they will not pick it up for processing... – shiv455 Feb 22 '16 at 19:12
  • 14
    You cannot inform other consumers that one message hasn't been processed correctly. But if one consumer fails the other consumer will take his job. Meaning: if you have 12 partitions and 3 consumers with the same Group Id, each consumer reads 4 partitions. If one consumer fails, [rebalancing](http://stackoverflow.com/questions/27181693/how-does-consumer-rebalancing-work-in-kafka) occurs and now the two living consumers will read 6 partitions. Be aware that if you don't update the offset after every message you can read some messages more than once. – Lukáš Havrlant Feb 22 '16 at 19:30
  • sorry i guess my question is confusing...lemme break it up 1.if a message is processed by a consumer and the offset is commited.now during processing message, the external dependencies of consumer failed and message not processed, consumer is up and running though...how the message will be retried,as the offset is set to read next message by the consumer..2.if a consumer is processing messages in a particular partition and its able to process few messages and died,you said there will be rebalancing and partitions will be redistributed how other consumers will know the offset of died consumer – shiv455 Feb 22 '16 at 19:48
  • 1) You can use low level consumer API (or there is entirely new consumer API in the new Kafka 0.9, I haven't read it yet), it gives you the possibility to manage the offset committing all by yourself. It means you can wait until the message is finally processed and save the offset after that. There is no easy way how to processed already processed and commited message. I think in this case you need to start a new consumer and tell him to consume the message with offset XYZ again or something like this. – Lukáš Havrlant Feb 22 '16 at 20:01
  • 2
    2) The offset is defined by topic, partition and group id. The living consumers with the same group id can retrieve the offset because they read the same topic and they have the same group id. – Lukáš Havrlant Feb 22 '16 at 20:03
  • Thanks @Lukas Havrlant ....we have java restapi which needs to work as producer to write into the queue...does the producer need to know which partition it needs to write the message??? please suggest a sample where i can start and include the producer logic in my Rest api... – shiv455 Feb 22 '16 at 20:53
  • And for 1) i guess if the external dependencies fails ill not commit the offset so that it will retry to process the message ..does this sounds like workaround?? – shiv455 Feb 22 '16 at 20:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/104306/discussion-between-shiv455-and-lukas-havrlant). – shiv455 Feb 23 '16 at 14:30
  • why would one want to consume the same message twice by two different consumers? – ffff Aug 26 '16 at 19:05
  • 4
    @FaizHalde: In our case: first, we consume each message for realtime processing and later on we consume the same set of messages for the second time when we transfer message from Kafka to HDFS for further analysis. In general, if you have multiple microservices, each of them could read the same messages and do different stuff with them. – Lukáš Havrlant Mar 26 '17 at 09:03
  • 3
    What happens when there are more consumers in the same group, let's say 14, and only 12 partitions? Can the redundant consumers still connect to Kafka? – Bianca Tesila Aug 23 '18 at 13:32
  • 4
    @BiancaTesila The two remaining consumers would be connected but they would read nothing. Basically they would be inactive. – Lukáš Havrlant Aug 24 '18 at 14:28
  • 1
    @LukášHavrlant wont the topic be messed up by the offsets from one consumer grp to another? If a consumer grp complete the processing, it will create the offset. But if the other consumer grp is not done with the processing.. Will the same data in the topic be available for the other consumer grp – OK999 Mar 27 '19 at 19:55
  • What if the consumers in a single consumer group are more than the number of partitions? Then multiple consumers might end up reading from the same partition right? Wouldn't that cause some unwanted side effects like processing the same data twice? – AV94 Apr 11 '19 at 17:18
  • what if the topic has one partition but multiple consumers within same group, how would that work? – lollerskates Jul 22 '21 at 15:41
  • good reference:https://medium.com/@jhansireddy007/how-to-parallelise-kafka-consumers-59c8b0bbc37a – Baodi Di Oct 16 '21 at 04:55
  • I have one partition and two consumers in the same group, and always commit messages after processing 100 messages but I got the same messages on another consumer (randomly) is this correct? – Vikram Biwal Oct 30 '21 at 18:12
88

I found this image from OReilly helpful:

Kafka

Within same group: NO

  • Two consumers (Consumer 1, 2) within the same group (Group 1) CAN NOT consume the same message from partition (Partition 0).

Across different groups: YES

  • Two consumers in two groups (Consumer 1 from Group 1, Consumer 1 from Group 2) CAN consume the same message from partition (Partition 0).
nurgasemetey
  • 752
  • 3
  • 15
  • 39
SynergyChen
  • 1,161
  • 9
  • 4
33

Kafka will deliver each message in the subscribed topics to one process in each consumer group. This is achieved by balancing the partitions between all members in the consumer group so that each partition is assigned to exactly one consumer in the group. Conceptually you can think of a consumer group as being a single logical subscriber that happens to be made up of multiple processes.

In simpler words, Kafka message/record is processed by only one consumer process per consumer group. So if you want multiple consumers to process the message/record you can use different groups for the consumers.

Karan Khanna
  • 1,947
  • 3
  • 21
  • 49