How are checkpoints and trimming related in AWS KCL library?
The documentation page Handling Startup, Shutdown, and Throttling says:
By default, the KCL begins reading records from the tip of the stream;, which is the most recently added record. In this configuration, if a data-producing application adds records to the stream before any receiving record processors are running, the records are not read by the record processors after they start up.
To change the behavior of the record processors so that it always reads data from the beginning of the stream, set the following value in the properties file for your Amazon Kinesis Streams application:
initialPositionInStream = TRIM_HORIZON
The documentation page Developing an Amazon Kinesis Client Library Consumer in Java says:
Streams requires the record processor to keep track of the records that have already been processed in a shard. The KCL takes care of this tracking for you by passing a checkpointer (IRecordProcessorCheckpointer) to processRecords. The record processor calls the checkpoint method on this interface to inform the KCL of how far it has progressed in processing the records in the shard. In the event that the worker fails, the KCL uses this information to restart the processing of the shard at the last known processed record.
The first page seems to say that the KCL resumes at the tip of the stream, the second page at the last known processed record (that was marked as processed by the RecordProcessor
using the checkpointer
). In my case, I definitely need to restart at the last known processed record. Do I need to set the initialPositionInStream to TRIM_HORIZON?