We are developing a pipeline in apache flink (datastream API) that needs to sends its messages to an external system using API calls. Sometimes such an API call will fail, in this case our message needs some extra treatment (and/or a retry).
We had a few options for doing this:
We
map()
our stream through a function that does the API call and get the result of the API call returned, so we can act upon failures subsequently (this was my original idea, and why i did this: flink scala map with dead letter queue)We write a custom sink function that does the same.
However, both options have problems i think:
With the
map()
approach i won't be able to get exactly once (or at most once which would also be fine) semantics since flink is free to re-execute pieces of pipelines after recovering from a crash in order to get the state up to date.With the custom sink approach i can't get a stream of failed API calls for further processing: a sink is a dead end from the flink APPs point of view.
Is there a better solution for this problem ?