0

Assume a topic has 2 partitions. We have 2 consumers in a same group both consume from that topic: Consumer A consumes partition 0, Consumer B consumes partition 1. Consumer A is the group leader of the consumer group.

At a moment, Consumer B got a batch of messages from the topic, for example message: X, Y. Right after that, Consumer B stopped.

After a while, Consumer A thinks that Consumer B is dead, and decides to rebalance and consume from partition 1. It gets messages: X, Y, Z (in order) and then writes to a database.

After that, Consumer B resumes the execution, had no idea that some time has passed, and continue to write message X, Y, overwriting the effect of Z. And then Consumer B fails completely.

Is it possible? If yes, the simple way of consuming messages and upserting to a database might not be safe.

1 Answers1

0

Consumer A thinks that Consumer B has dead, decide to rebalance and consume from partition 1. It got messages: X, Y, Z (in order) and then write to a database

Okay, let's say it commits offsets for each of those messages as well.

If B starts back up, it will start to read offsets after the most recent committed value, and not re-read data that it had tried to before unless the consumer was manually seeked to that position.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • It's stopped right after receving messages, but before write to Database, so it didnt touch committed offsets topic to determine offsets is old version or not. – Quang Tùng Jun 21 '21 at 12:46
  • B doesnt fail, it's just paused, because of GC pause, VM pause or someone accidentally send stop signal to B. – Quang Tùng Jun 21 '21 at 12:47
  • Failure or pausing would still cause a rebalance and loss of processed record state – OneCricketeer Jun 21 '21 at 20:26
  • B can detect it has read old offsets but ONLY after B has already written to the database. Above problem still there – Quang Tùng Jun 22 '21 at 00:55
  • Pause of B can cause A to rebalance, but B didnt even know right away that itself has already left the group when it performing the write – Quang Tùng Jun 22 '21 at 01:17
  • I'm not sure I understand your problem. Ideally, you wouldn't write a database client in a consumer on your own, anyway, and would rather use Kafka Connect framework, which has fault tolerance controls – OneCricketeer Jun 22 '21 at 16:11