I have a flink app (flink version is 1.9.2) which enabled checkpoint function. When I run it in the apache flink platform. I always get the checkpoint failed message: Checkpoint expired before completing.After check the threadDumps of the taskManager during a checkpoint, I found that a thread which contains two operators that request external service was always in runnable state. Below are my design of this operator and the checkpoint configuration. Please help advise how to resolve the issue ?
operator design:
public class OperatorA extends RichMapFunction<POJOA, POJOA> {
private Connection connection;
private String getCusipSourceIdPairsQuery;
private String getCusipListQuery;
private MapState<String, List<POJOX>> modifiedCusipState;
private MapState<String, List<POJOX>> bwicMatchedModifiedCusipState;
@Override
public POJOA map(POJOA value) throw Exception {
// create local variable PreparedStatement every time invoke this map method
// update/clear those two MapStates
}
@Override
public void open(Configuration parameters) {
// initialize jdbc connection and TTL MapStates using GlobalJobParameters
}
@Override
public void close() {
// close jdbc connection
}
}
public class OperatorB extends RichMapFunction<POJOA, POJOA> {
private MyServiceA serviceA;
private MyServiceB serviceB;
@Override
public POJOA map(POJOA value) throw Exception {
// call a restful GET API of ServiceB, get a XML response, about 500 fields in the response.
// use serviceA's function to extract the XML document and then populate the value fields.
}
@Override
public void open(Configuration parameters) {
// initialize local jdbc connection and PreparedStatement using globalJobParameters. then use the executed results to initialize serviceA.
// initialize serviceB.
}
}
checkpoint configuration:
Checkpointing Mode Exactly Once
Interval 15m 0s
Timeout 10m 0s
Minimum Pause Between Checkpoints 5m 0s
Maximum Concurrent Checkpoints 1
Persist Checkpoints Externally Disabled
Sample checkpoint history:
ID Status Acknowledged Trigger Time Latest Acknowledgement End to End Duration State Size Buffered During Alignment
20 In Progress 3/12 (25%) 15:03:13 15:04:14 1m 1s 5.65 KB 0 B
19 Failed 3/12 14:48:13 14:50:12 10m 0s 5.65 KB 0 B
18 Failed 3/12 14:33:13 14:34:50 10m 0s 5.65 KB 0 B
17 Failed 4/12 14:18:13 14:27:04 9m 59s 2.91 MB 64.0 KB
16 Failed 3/12 14:03:13 14:05:18 10m 0s 5.65 KB 0 B