Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once.
I need to understand the answer in the following link: Flink exactly-once message processing
Does this means that Flink Sink will produce duplicate events to the external system like Cassandra?
For example:
1 - I have the following flow: source -> flatMap with state -> sink and a configured snapshot interval as 20 seconds.
What will happen if the task manger goes down (Killed) between two snapshots (after 10 seconds form the last snapshot and 10 seconds before the next snapshot).
What I know is that Flink will restart the job from the last snapshot.
In this case the Sink will reprocess all the records that already processed between the last snapshot and the down time?