Kafka work/offset coordination with consumer

Question

I'm currently trying to design a scalable consumer architecture for kafka and i'm running into some issues with offset coordination. It is important for my use case that each message consumed by kafka is processed exactly once.

Take the following for an illustration of the problem:

Consumer retrieves message from Kafka
Consumer processes message (business logic, yay!)
Consumer finishes processing, increments local offset
Consumer attempts to commit offset back to kafka
This network call fails for X reason
The above error, or anything else, causes the consumer to crash before the offset commit can be retried
System orchestrator brings up another consumer, which then fetches the outdated offset
The same message is retrieved, and re-processed (bad!)

For those with more distributed systems experience than I, you've probably recognized that this is (more or less) the Two Generals problem applied to Kafka offset/work result coordination.

I've thought about committing the offset and the work result in a (probably SQL) db transaction but that ties those implementations together and also limits my data store options (also, what do I do if I move my data store to something without transactions?). Another possible solution would be hashing each message and using bloom filters to probabilistically prevent duplicate processing, but now we're starting to add complexity I'd preferably like to avoid.

score 0 · Answer 1 · edited May 23 '17 at 10:28

This kind of problems is common for boundaries between systems and the Kafka's FAQ suggests to use transactions to achieve exactly-once delivery guaranties.

You raised a concern that the need for transactions would limit the storage choice to the SQL solutions. It's not true since a lot of NoSQL solutions like Riak, Cassandra, RethinkDB or CockroachDB have mechanisms such as single-document atomic or compare-and-set operations which can be used as a substitute to ACID transactions or as a foundation for client-side ACID transactions.

Please see the How to manage transactions over multiple databases question for more information since the algorithms for multi-shard transactions work fine on the multi-key level too.

Kafka work/offset coordination with consumer

1 Answers1