4

I have samza job with a number of tasks, each of which holds some state in its embedded store. I want to expose this store for reading to outside world via some kind of RPC mechanism. What could be the best solution for this?

Here is one paragraph in Samza documentation about it:

Samza does not currently have an equivalent API to DRPC, 
but you can build it yourself using Samza’s stream 
processing primitives.

The only solution which comes to my mind is to make my tasks, in addition to normal processing, to consume request messages with some correlation IDs on a special request topic, and to put response messages with the same correlation IDs into special response topic. So it's like RPC-over-Kafka solution which seems to me suboptimal.

Any thoughts are welcome!

Vladimir Lebedev
  • 1,207
  • 1
  • 11
  • 25

1 Answers1

0

As far as I remember the embedded store is backed up in a Kafka topic. When you set something in the store, the message is produced to the topic. Thus you can consume this topic and you can "clone" the embedded store to a different database. Then you can query the database. Or you can use just the database instead of the embedded store. But this approach could lead to performance issues in your Samza job...

Lukáš Havrlant
  • 4,134
  • 2
  • 13
  • 18