I'm creating an AWS Flink application in Java that stream from Iceberg and wondering if Flink has mechanism that providing possibility of restarting stream from last snapshot-id that was successfully processed, if the whole application is down. Should I sink snapshot-id to database or there is a better solution for that ?
Expected scenario:
- Flink stream processing messages from Iceberg table (application is scalable and can run in many processes) and store results in Kinesis and another IcebergTable
- application has died unexpectedly
- I'm starting application again
- it continues to processing messages without any data loss and without any manual interference
I don't use dedicated IcebergSource. Perhaps this implementation could solve it somehow. Right now I'm using FlinkSource. Both of source implementations has method for set snapshot-id, but it have to be set manually and store somewhere during processing. Is there any way to avoid it and use internal Flink mechanism ?