Kafka Streams Internal Data Management

Question

In my company, we are using Kafka extensively, but we have been using relational database to store results of several intermediary transformations and aggregations for fault tolerance reasons. Now we are exploring Kafka Streams as a more natural way to do this. Often, our needs are quite simple - one such case is

Listen to an input queue of <K1,V1>, <K2,V2>, <K1,V2>, <K1,V3>...
For each record, perform some high latency operation (call a remote service)
If by the time <K1,V1> is processed, and both <K1,V2>, <K1,V3> have been produced, then I should process V3 as V2 has already become stale

In-order to achieve this, I am reading the topic as a KTable. Code looks like below

KStreamBuilder builder = new KStreamBuilder();
KTable<String, String> kTable = builder.table("input-topic");
kTable.toStream().foreach((K,V) -> client.post(V));
return builder;

This works as expected, but it is not clear to me how Kafka automagically achieves this. I was assuming that Kafka creates internal topics to achieve this, but I do not see any internal topics created. Javadoc for the method seem to confirm this observation. But then I came across this official page which seem to suggest that Kafka uses a separate datastore aka RocksDB along with a changelog topic.

Now I am confused, as under which circumstances are changelog topic created. My questions are

If the default behaviour of state store is fault-tolerant as suggested by official page, then where is that state stored? In RocksDB? In changelog topic or both?
What are the implications of relying on RocksDB in production? (EDITED)
1. As I understood, the dependency to rocksdb is transparent (just a jar file) and rocksdb stores data in local file system. But this would also means that in our case, that application will maintain a copy of the sharded data on the storage where application is running. When we replace a remote database with a KTable, it has storage implications and that is my point.
2. Will Kafka releases take care that RocksDB will continue to work on various platforms? (As it seem to be platform dependent and not written in Java)
Does it make sense to make input-topic log compacted?

I am using v. 0.11.0

Matthias J. Sax · Accepted Answer · 2018-05-10T23:54:09.287

Kafka Streams stores state locally. By default using RocksDB. However, local state is ephemeral. For fault-tolerance, all updates to a store are also written into a changelog topic. This allows to rebuild and/or migrate the store in case of failure or scaling in/out. For your special case, no changelog topic is created because the KTable is not the result of an aggregation but populated directly from a topic -- this is an optimization only. Because the changelog topic would contain the exact same data as the input topic, Kafka Streams avoids data duplication and uses the input topic as changelog topic in cause if an error scenario.
Not sure exactly what you mean by this question. Note, that RocksDB is considered an ephemeral store. It's used by default for various reasons as discussed here: Why Apache Kafka Streams uses RocksDB and if how is it possible to change it? (for example it allows to hold state larger than main-memory as it can spill to disk). You can replace RocksDB with any other store. Kafka Streams also ships with an in-memory store. (Edit)
1. That's correct. You need to provision your application accordingly to be able to store the local shard of the overall state. There is a sizing guide for this: https://docs.confluent.io/current/streams/sizing.html
2. RocksDB is written in C++ and included via JNI binding. There are some known issues on Windows as RocksDB does not provide pre-compiled binaries for all versions of Windows. As long as you stay on Linux based platform, it should work. Kafka community runs upgrade tests for RocksDB to make sure it's compatible.
Yes. Kafka Streams actually assumes that the input topic for a table() operation is log-compacted. Otherwise, there is the risk of data loss in case of failure. (Edit)
1. If you enable log-compaction, retention time setting is ignored. Thus, yes, the latest update will be maintained forever (or until a tombstone message with value=null is written). Note, that when compaction is execute on the broker side, old data is garbage collected and thus, on restore, only the new version per key are read -- old versions got removed during compaction process. If you are not interested in some data after some period of time, you would need to write a tombstone into the source topic to make it work.

thanks for the detailed answer and for being a champion on this subject! (2): I've edited the question. Your thoughts are much appreciated. (3) I do not understand this one. Will a log compacted topic maintain the last value for each key, even after retention period is over (in our case 24 hours)? If let us say, we are not interested in those `key,value` for which retention period have passed, does your comment still hold good? As I understood, the idea of compaction is to make recovery really efficient as state can be restored without having to go through entire dataset for each key — senseiwu, May 10 '18 at 08:01

Kafka Streams Internal Data Management

1 Answers1

Linked