1

I am newbie to Kafka and learning Kafka internals.. Please feel free to correct my understanding as required..

Here is my real time scenario.. appreciate all the responses:

  1. I have a real time FTP server which receives data files.. Lets say claims files.
  2. I will publish these data into a topic. lets call the topic as claims_topic (2 partitions).
  3. I need to subscribe to this claims_topic, read the messages and write them to Oracle and Postgres table. Lets call oracle table as Otable and Postgres table as Ptable.
  4. I need to capture every topic message and write them to Otable and Ptable. Basically Otable and Ptable has to be in sync.

Assume that I will write two consumers one for oracle and other for postgres.

Question1: Should the two consumers be in same consumer-group? I believe No. as it will lead to one consumer getting messages only from a particular partition.

Question2: If Question1 is TRUE. then please enlighten me in what case multiple consumers are grouped under a same consumer-group? real time scenario is much appreciated.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
mnaray
  • 11
  • 1

2 Answers2

0

consumer group is a logical name that group an application consumers together, they are working together towards finish processing the data inside topic , each partition can be handled only by one consumer of consumer group, making partition count the maximum limit of parallel consumption/ processing power for a topic. each consumer in consumer group is handling one or more partitions , if you have one consumer on topic with many partitions it will handle all the partitions by itself, if you would add more consumers to the same consumer group they will divide / "rebalance" the topic partition among them , hope it clears things up

When setting up a consumer you configure its group id, this is the consumer group, two separate consumers with same group id are becoming members of the same consumer group

In cases where there is high produce throughout and one consumer can not handle the pressure you can scale it out by running more consumers with same consumer group to work together to process the topic , each task would take ownership on different partitions

For your use case complete sync of Postgres and Oracle won't be easily achievable, you could use kafka connect to read data from your topic to your targets with relevant sink connectors, but than again they will be "eventually consistent " as they do not share an atomic transaction

I would explore spring data transctional layer

Spring @Transactional with a transaction across multiple data sources

Ran Lupovich
  • 1,655
  • 1
  • 6
  • 13
  • Thank you for the reply. my understanding is each consumer is a separate independent process connecting to the topic. With that in mind, how are the consumers grouped under same consumer-group? what is the grouping criteria used to put consumers under same group? – mnaray Jun 04 '21 at 16:57
  • When setting up a consumer you configure its group id, this is the consumer group, two separate consumers with same group id are becoming members of the same consumer group – Ran Lupovich Jun 04 '21 at 16:59
  • Yes, I know we use the group.id while setting up the consumer to define its consumer-group. My question is more towards when do I need to use the same group for independent process? The use case I posted in my question, the oracle and Postgres process has to be in a separate group right? – mnaray Jun 04 '21 at 17:14
  • If your ingestion process to pgs and oracle are separated so you must use separate groups in order to get all messages from the topic. In cases where there is high produce throughout and one consumer can not handle the pressure you can scale it out by running more consumers with same consumer group to work together to process the topic , each task would take ownership on different partitions – Ran Lupovich Jun 04 '21 at 17:18
  • Okay, so if I have to do a parallelism of 3, then I should replicate the same consumer process thrice and start all of them under same consumer-group. correct? – mnaray Jun 04 '21 at 17:42
  • Investigate your use case some more , came across this post - my suggestion for you is to invest some time exploring chained transcational in Spring Data Framework, good luck – Ran Lupovich Jun 04 '21 at 17:45
  • https://stackoverflow.com/questions/48954763/spring-transactional-with-a-transaction-across-multiple-data-sources – Ran Lupovich Jun 04 '21 at 18:59
0

NO, Both consumers do not want to be in same consumer group, because they need to consume all topic data separately and write to Otable and Ptable.

If Both consumers are in one consumer group, then Otable getting data in one partition and Ptable getting data from other partition. (Because you have 2 partition)

In my opinion, use two consumers with two consumer group, then if there is high traffic in your topic, Then you can scale number of consumers separately for Otable and Ptable.

If you need two consumers to write Ptable, Use same group id for those consumers. Then consumer traffic will be shared with number of consumers. (in your case, maximum number of consumers for one group should be 2, because you have only 2 partitions in your topic). If you need this for Otable, follow the same scenario.

nipuna
  • 3,697
  • 11
  • 24
  • Thank you for the reply. How do I scale the number of consumers? I believe I can start multiple instances of the consumer code? Please correct or add if there are more ways.. Thanks – mnaray Jun 05 '21 at 17:13
  • You can start multiple consumers belongs to single consumer group. Then it will consume from different partition from your topic. But maximum numb er of consumer instances should be number of partition of the topic – nipuna Jun 05 '21 at 18:08