Apache Samza uses RocksDB as the storage engine for local storage. This allows for stateful stream processing and here's a very good overview.
My use case:
- I have multiple streams of events that I wish to process taken from a system such as Apache Kafka.
- These events create state - the state I wish to track is based on previous messages received.
- I wish to generate new stream events based on the calculated state.
- The input stream events are highly connected and a graph such as OrientDB / Neo4J is the ideal medium for querying the data to create the new stream events.
My question:
Is it possible to use a non-KV store as the local storage for Samza? Has anyone ever done this with OrientDB / Neo4J and is anyone aware of an example?