0

I've performed a stress test for a consumer-producer application that's using Kafka transactions in order to achieve exactly-once delivery. Two instances, each of which had two concurrently running consumers were launched simultaneously. Some network drops were simulated (using docker network disconnect bridge) to test exactly-once delivery. As a result of the test, there were lots of duplicates in the outbound topic.

Here's what happened:

Instance number 1 connected to the Kafka and started consuming messages. There were no polls with records from the partition X. Instance 2, however, was processing it. In the __consumer_offsets topic I can see, that the commits for this partition were performed (I'm committing offsets using producer, as you would do with tx producer).

At 10:35:36 producer committed offset of the last record from the partition X.
However, for some reason, at 10:36:31 another instance received batch with first records from partition X, and, as a result, partition was processed twice.

What might be the reason for such behavior? I'm not using spring or any other Kafka adapters.

  • 1
    just to double check- all consumers were using the same group id? – radai Nov 14 '19 at 02:32
  • @radai Yep, they were. I think I understood why it happened. After the consumer was assigned partition X it lost connection to the Kafka cluster but had stored the current offset for this partition. While the network was down the other instance processed this partition, and then when the connection was restored it most-likely polled records using offset stored in memory. (it was 0 for him, but I've also reproduced the issue when consumer started reading from somewhere in the middle, so it could be any offset). Might post it as the answer later. – Никита Михайлов Nov 14 '19 at 09:22
  • it should have been kicked out of the consumer group after disconnecting – radai Nov 14 '19 at 16:49

0 Answers0