0

i read about flink`s window assigners over here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-assigners , but i cant find any solution for my problem.

as part of my project i need a windowing that the timer will start given the first element of the key and will be closed and set ready for processing after X minutes. for example:

first keyA comes at (hh:mm:ss) 00:00:02, i want all keyA will be windowing until 00:01:02, and then the timer of 1 minutes will start again only when keyA will be given as input.

Is it possible to do something like that in flink? is there a workaround? hope i made it clear enough.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
ShemTov
  • 687
  • 3
  • 8

1 Answers1

0

Implementing keyed windows that are aligned with the first event, rather than with the epoch, is quite difficult, in general, which I believe is why this isn't supported by Flink's window API. The problem is that with an out-of-order stream using event time processing, as earlier events arrive you may need to revise your notion of when the window began, and when it should end. For example, if the first keyA arrives at 00:00:02, but then some time later an event with keyA arrives with a timestamp of 00:00:01, now suddenly the window should end at 00:01:01, rather than 00:01:02. And if the out-of-orderness is large compared to the window length, handling this becomes quite complex -- imagine, for example, that the event from 00:00:01 arrives 2 minutes after the event from 00:00:02.

Rather than trying to implement this with the window API, I would use a KeyedProcessFunction. If you only need to support processing time windows, then these concerns about out-of-orderness do not apply, and the solution can be fairly simple. It suffices to keep one object in keyed state, which might be a list holding all of the events in the window, or a counter or other aggregator, depending on what you're trying to accomplish.

When an event arrives, if the state (for this key) is null, then there is no open window for this key. Initialize the state (i.e., create a new, empty list, or set the counter to zero), and create a Timer to fire at the appropriate time. Then regardless of whether the state had been null, add the incoming event to the state (i.e., append it to the list, or increment the counter).

When the timer fires, emit the window's result and reset the state to null.

If, on the other hand, you want to do this with event time windows, first sort the stream and then use the same approach. Note that you won't be able to handle late events, so plan your watermarking accordingly (reducing the likelihood of late events to a manageable level), or go for a more complex implementation.

David Anderson
  • 39,434
  • 4
  • 33
  • 60