I am exploring AWS Kinesis for a data processing requirement that replaces old batch ETL processing with a stream based approach.
One of the key requirements for this project is the ability to reprocess data in cases when
- A bug is discovered and fixed and the application is redeployed. Data needs to be reprocessed from the beginning.
- New features are added and the history needs to be reprocessed either fully or partially.
The scenarios are very nicely documented here for Kafka - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios.
I have seen the timestamp based ShardIterator in Kinesis and I think a Kafka like resetter-tool can be built using Kinesis APIs but it would be great if something like this already exists. Even if it doesn't, it would be good to learn from those who have solved similar problems.
So, does anyone know of any existing resources, patterns and tools available to do this in Kinesis?