0

Question

I would like to be able to view the persisted state associated with function addresses in Flink Stateful Functions. What is the best option for achieving this?

Details

  • By "persisted state", I mean the data that StateFun passes to our function in context.storage (Node.js).
  • We're using the RocksDB state backend.
  • We only need to view persisted state for:
    • Ad hoc debugging purposes. E.g., a developer needs to know the state of a particular function address.
    • Auditing data consistency between systems. E.g., a QA tool that verifies that our StateFun persisted state is consistent with the data in our transactional database.
  • Ideally we would be able to do this in a standalone program, but being able to do this in a Flink job would be workable.
  • Our functions are implemented in Node.js, but I assume I'll need to use Java to view persisted state.
  • We're using StateFun 3.3-SNAPSHOT, which uses Flink 1.15.2.

Options that I have investigated

State Processor API

I am currently trying to use the State Processor API to read the functions_uid1 operator's keyed state from a savepoint. My code is at https://github.com/gabrielmdeal/flink-data-viewer

Cons:

  • My job fails with this error: The new key serializer (org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer@48a56a78) must be compatible with the previous key serializer (org.apache.flink.api.common.typeutils.base.StringSerializer@2e58735)
  • Must read from a savepoint instead of from the most recent state. Workable but not ideal.
  • Must run as a Flink job instead of in a standalone process. Workable but not ideal.

Queryable State API

The Queryable State API provides most of the functionality I want.

Cons:

  • It is in the "features phasing out" section of the Flink Roadmap.
  • It requires one lookup per function address, which would be very slow for the number of function addresses our data has. No way to iterate over all keys.
  • I don't think StateFun enables queryable state on its data. This could be done by calling setQueryable() on the relevant descriptor in FlinkState.java and building a custom Docker image via build-stateful-functions.sh.

Publish a "log storage" event to our function

We do have a Kafka event that causes our function to log the data in context.storage. This is helpful for ad hoc debugging.

Cons:

  • It breaks down in situations where all events are not reaching our function. E.g., poison pill or scaling issues.
  • We would like to be able to write QA code that iterates over all persisted state.

Read directly via RocksDB

I have only made very tentative attempts to read the savepoint data via the ldb tool or via a RocksDB API. I know next to nothing about RocksDB, but my suspicion is that this would require a lot of knowledge about how Flink uses RocksDB.

Gabriel Deal
  • 935
  • 1
  • 8
  • 19

0 Answers0