0

I am trying to have exactly-once consuming of Kafka Consumer.
My requirement is of:

  1. Read data from Topic
  2. Process the data [which involves calling another API]
  3. Writing the response back to Kafka

I wanted to know if exactly once is possible in this scenario?

I know that use case satisfies Kafka streams API, but I wanted to know from the Producer/Consumer API? Also, if lets say that after processing of the data, the consumer fails for some reason, (the processing should be done only once), what would be best way to handle such cases? Can there be any continuation/checkpoint for such cases?

I understand that Kafka Streams API is produce-consumer-produce transactional. Here also, if after calling the API consumer crashes, the flow would start from the very start, right?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Raghav
  • 67
  • 1
  • 9
  • Does this answer your question? [Kafka only once consumption guarantee](https://stackoverflow.com/questions/42165726/kafka-only-once-consumption-guarantee) – pringi May 09 '22 at 15:38
  • I have gone through that above link. It talks of storing the message-id in some external store; the problem with this is even if we store and processing fails, then it should be retried, but it won't be. – Raghav May 09 '22 at 15:46

1 Answers1

4

Yes; Spring for Apache Kafka supports exactly once semantics in the same way as Kafka Streams.

See

https://docs.spring.io/spring-kafka/docs/current/reference/html/#exactly-once

and

https://docs.spring.io/spring-kafka/docs/current/reference/html/#transactions

Bear in mind that "exactly once" means that the entire successful

consume -> process -> produce

is performed once. But, if the produce step fails (rolling back the transaction), then the consume -> process part is "at least once".

Therefore, you need to make the process part idempotent.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • So if I understand correctly, if we fail at the processing step, the consumer will again consume the same message? so if I want to have exactly-once in read->process kind of scenario [I don't have control over the external API call], then that is not possible? – Raghav May 09 '22 at 16:33
  • It is not possible; you won't know if the process failed before or after the API call. This is true with Kafka Streams as well; it's a common misunderstanding that the term "exactly once" only applies to the entire sequence - the entire sequence is completed successfully exactly once but the consume and consume+process are performed at least once. – Gary Russell May 09 '22 at 16:51
  • 1
    A common technique is to store the topic/partition/offset of the record in a DB if the process part was successful and skip it during the redelivery. But there is still room for error (e.g. the API call was successful but storing the offset was not). – Gary Russell May 09 '22 at 16:53
  • Ah got it! Thanks Gary! I had one more question, https://stackoverflow.com/questions/72202595/how-to-commit-offset-after-deserialization-error-spring-kafka. Can you please help regarding that also? – Raghav May 11 '22 at 14:13