Kinesis Data Analytics Flink: Continually Increasing Checkpoint Size

Question

I am running a Flink application using the AWS Kinesis Data Analytics (KDA) service. My KDA Flink application last checkpoint size appears to be growing steadily over time. The sudden drops in checkpoint size you can see in the attached graph correspond with when I pushed changes out to the app, causing it to take a snapshot, update, and then restore from the snapshot. My concern is that once the application is no longer being actively developed, changes will not be deployed as regularly, and the checkpoint size could grow to eventually be too large.

Does anyone know what would cause the checkpoint size to grow continuously without end? I am using State TTL on all significant state and removing state in application code when it is no longer needed. Does the checkpoint size increasing indicate I have a bug in the code that handles state, or is something else potentially at play here?

David Anderson · Answer 1 · 2021-05-07T12:38:56.127

4

Update: See https://stackoverflow.com/a/67435073/2000823 for a better answer.

AWS Kinesis Data Analytics (KDA) is currently based on Flink 1.8, where this documentation regarding state cleanup applies.

Note that

by default if expired state is not read, it won’t be removed, possibly leading to ever growing state

You can also activate cleanup during full snapshots (which seems to be occurring), and background cleanup (which sounds like what you want). Note that for some workloads, even if background cleanup is enabled, the default settings for background cleanup might be insufficient to keep up with the rate at which state should be cleaned up, and some tuning might be necessary.

By the way, background cleanup is enabled by default since Flink 1.10.

If this doesn't answer your question, please clarify precisely how state TTL is configured.

edited May 07 '21 at 12:38

answered Sep 22 '20 at 07:19

David Anderson

39,434
4
33
60

Thanks for another quick response!, Follow up question: If I have MapState and explicitly call someMapState.remove(someKey), will the data for that key be removed, even if I never read it again, or is it just marked for removal but still there in RocksDB? – ChrisATX Sep 22 '20 at 13:49
someMapState.remove(someKey) does remove the rocksdb entry for someKey. someMapState.clear() removes the entire map for the current stream key. – David Anderson Sep 22 '20 at 14:39

Kinesis Data Analytics Flink: Continually Increasing Checkpoint Size

1 Answers1