This question covers how to sort an out-of-order stream using Flink SQL, but I would rather use the DataStream API. One solution is to do this with a ProcessFunction that uses a PriorityQueue to buffer events until the watermark indicates they are no longer out-of-order, but this performs poorly with the RocksDB state backend (the problem is that each access to the PriorityQueue will require ser/de of the entire PriorityQueue). How can I do this efficiently regardless of which state backend is in use?
2 Answers
A better approach (which is more-or-less what is done internally by Flink's SQL and CEP libraries) is to buffer the out-of-order stream in MapState, as follows:
If you are sorting each key independently, then first key the stream. Otherwise, for a global sort, key the stream by a constant so that you can use a KeyedProcessFunction to implement the sorting.
In the open
method of that process function, instantiate a MapState object, where the keys are timestamps and the values are lists of stream elements all having the same timestamp.
In the onElement
method:
- If an event is late, either drop it or send it to a side output
- Otherwise, append the event to entry of the map corresponding to its timestamp
- Register an event time timer for this event's timestamp
When onTimer
is called, then the entries in the map for this timestamp are ready to be released as part of the sorted stream -- because the current watermark now indicates that all earlier events should have already been processed. Don't forget to clear the entry in the map after sending the events downstream.

- 39,434
- 4
- 33
- 60
-
why does sorting need to happen in a KeyedProcessFunction? why cant a regular ProcessFunction be used for a global sort? – ybudweiser Mar 12 '20 at 19:30
-
Because only a KeyedProcessFunction can use event time timers, and because it's convenient to be able to use keyed state. But you could implement a global sort using a custom operator along with operator state. – David Anderson Mar 13 '20 at 14:13
-
So the value type is probably a ArrayList. Does flink do ser/de for every element appended into the list? Is it costly? – Robin Xing Feb 09 '22 at 06:22
-
Yes, that is potentially costly, but only if there are a significant number of hash collisions. – David Anderson Feb 09 '22 at 09:57
Unfortunately, the solution with timers did not work for us. It led to checkpoints failing due to huge amount of timers being generated. As an alternative, we did a sort with tumbling windows:
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.Duration;
import java.util.stream.StreamSupport;
public class EventSortJob {
private static final Duration ALLOWED_LATENESS = Duration.ofMillis(2);
private static final Duration SORT_WINDOW_SIZE = Duration.ofMillis(5);
private static final Logger LOGGER = LoggerFactory.getLogger(EventSortJob.class);
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
SingleOutputStreamOperator<Integer> source = env
.fromElements(0, 1, 2, 10, 9, 8, 3, 5, 4, 7, 6)
.assignTimestampsAndWatermarks(
new WatermarkStrategy<Integer>() {
@Override public WatermarkGenerator<Integer> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
return new WatermarkGenerator<Integer>() {
private long watermark = Long.MIN_VALUE;
// punctuated watermarks are used here for demonstration purposes only!!!
@Override public void onEvent(Integer event, long eventTimestamp, WatermarkOutput output) {
long potentialWatermark = event - ALLOWED_LATENESS.toMillis(); // delay watermark behind latest timestamp
if (potentialWatermark > watermark) {
watermark = potentialWatermark;
output.emitWatermark(new Watermark(watermark));
LOGGER.info("watermark = {}", watermark);
}
}
// normally, periodic watermarks should be used
@Override public void onPeriodicEmit(WatermarkOutput output) {}
};
}
@Override public TimestampAssigner<Integer> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
return (element, recordTimestamp) -> element; // for simplicity, element values are also timestamps (in millis)
}
}
);
OutputTag<Integer> lateEventsTag = new OutputTag<Integer>("lateEventsTag") {};
SingleOutputStreamOperator<Integer> sorted = source
.keyBy(v -> 1)
.window(TumblingEventTimeWindows.of(Time.milliseconds(SORT_WINDOW_SIZE.toMillis())))
.sideOutputLateData(lateEventsTag)
.process(new ProcessWindowFunction<Integer, Integer, Integer, TimeWindow>() {
@Override public void process(
Integer integer,
ProcessWindowFunction<Integer, Integer, Integer, TimeWindow>.Context context,
Iterable<Integer> elements,
Collector<Integer> out
) {
StreamSupport.stream(elements.spliterator(), false)
.sorted()
.forEachOrdered(out::collect);
}
});
source.keyBy(v -> 1).map(v -> String.format("orig: %d", v)).addSink(new PrintSinkFunction<>());
sorted.addSink(new PrintSinkFunction<>());
sorted.getSideOutput(lateEventsTag).keyBy(v -> 1).map(v -> String.format("late: %d", v)).addSink(new PrintSinkFunction<>());
env.execute();
}
}

- 1,242
- 1
- 8
- 14