0

I want to do all stateful operation in external Database instead of RocksDB and to do that whereever stateful operation is required i am writing Custom Processor which will do DB operation and context#forward method forwards the key-value pair witten to DB to downstream consumers and finally writes to a Topic.

Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30

1 Answers1

0

Kafka Streams enables exactly once semantic only within Kafka topics.

You don't have such guarantees with writes to external systems - Databases.

Following scenario is possible:

  1. Custom Processor processor get message and perform write to external system (DB)
  2. Some fatal error occurs - offset commit in source topic is not made, none record is passed to downstream
  3. Application is restarted
  4. Same message is processed by Custom Processor: same message is write to external system (DB) and passed to downstream and later commit is performed.

In described scenario:

  • External system (DB) - gets same message twice

  • Across Kafka exactly once is achieved

If you want write message to external system (DB) it is better to use Kafka Connect (JDBC Sink Connector). You could fully processed message using Kafka Streams and than use Kafka Connect to copy data from output topic to Database.

More regarding exactly once semantic you can find:

Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30
  • If custom Processor is taking care of same message twice in case of application restarted scenarios while writing to DB , then it should not be problem. – Ambrish Tiwari Sep 08 '19 at 08:22
  • @AmbrishTiwari, I think the performance will be low if you will do that for each message. Kafka Connect makes bulk writes to external system. If you really need to do that with Kafka Streams you should consider using `Punctuator` – Bartosz Wardziński Sep 08 '19 at 08:29
  • External stores would be problematic, because the external store cannot be rolled-back in case of an error. Hence, exactly-once is currently not supporting external stores. – Matthias J. Sax Sep 09 '19 at 15:37
  • Just a thought - can the kafka transactional API be extended to chain with a DB transaction? I'm not sure what the performance impact will be. – aishwarya kumar Sep 12 '19 at 16:37
  • @aishwaryakumar, What do you mean by chain with DB transaction? Distributed transaction? That would be for sure hit performance and complexity. – Bartosz Wardziński Sep 12 '19 at 20:00
  • Yes distributed transaction, looks like Spring offers it https://stackoverflow.com/questions/47354521/transaction-synchronization-in-spring-kafka. – aishwarya kumar Sep 12 '19 at 20:26