A KTable
in Kafka Streams is generally a partitioned store backed by:
- A Changelog Topic on the Kafka Broker (for resilience)
- A RocksDB Instance for each input partition number (i.e. Kafka Streams Task).
When you query a KTable, you (generally) need to do two things:
- Determine the Streams Instance that hosts the partition for your key, via
KafkaStreams#queryMetadataForKey()
.
- Use Interactive Queries to contact that host, which then does (to simplify things) a
store.get(key)
.
The store.get()
is a lookup on RocksDB.
Every time your app writes a record to RocksDB, it is also written to your changelog topic. That way, if one of your streams app instances crashes, another instance can "replay" the changelog topic to build the state that used to live on the now-dead instance.
That process is called "restoration" and it can take a long time. So, you can enable Standby Tasks for faster failover. A Standby Task is just a task on a streams instance that reads the changelog in real-time so that if another streams instance crashes, the data is already "hot" and ready to go on a currently-live streams instance.
A note on RocksDB performance: it tends to degrade once you get past 30GB of data in one store. So use that as a guideline for how many input partitions you need to specify.