I've performed a stress test for a consumer-producer application that's using Kafka transactions in order to achieve exactly-once delivery. Two instances, each of which had two concurrently running consumers were launched simultaneously. Some network drops were simulated (using docker network disconnect bridge) to test exactly-once delivery. As a result of the test, there were lots of duplicates in the outbound topic.
Here's what happened:
Instance number 1 connected to the Kafka and started consuming messages. There were no polls with records from the partition X. Instance 2, however, was processing it. In the __consumer_offsets
topic I can see, that the commits for this partition were performed (I'm committing offsets using producer, as you would do with tx producer).
At 10:35:36 producer committed offset of the last record from the partition X.
However, for some reason, at 10:36:31 another instance received batch with first records from partition X, and, as a result, partition was processed twice.
What might be the reason for such behavior? I'm not using spring or any other Kafka adapters.