7

I am exploring AWS Kinesis for a data processing requirement that replaces old batch ETL processing with a stream based approach.

One of the key requirements for this project is the ability to reprocess data in cases when

  • A bug is discovered and fixed and the application is redeployed. Data needs to be reprocessed from the beginning.
  • New features are added and the history needs to be reprocessed either fully or partially.

The scenarios are very nicely documented here for Kafka - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios.

I have seen the timestamp based ShardIterator in Kinesis and I think a Kafka like resetter-tool can be built using Kinesis APIs but it would be great if something like this already exists. Even if it doesn't, it would be good to learn from those who have solved similar problems.

So, does anyone know of any existing resources, patterns and tools available to do this in Kinesis?

Rahul
  • 12,886
  • 13
  • 57
  • 62
  • Hi Rahul, Have you tried the kinesis VCR, did it work for you, if you have a different solution, please share the same. - Thanks – Srivignesh KN Feb 26 '18 at 23:57

1 Answers1

1

I have run into scenarios where i want to reprocess the kinesis processed records, I have used Kinesis-VCR for re-processing the kinesis generated records.

Kinesis-VCR records the kinesis streams and maintains a metadata of the files processed by kinesis at a given time.

Later, we can use to re-process/replay the events for any given time range.

Here is the github link for the same.

https://github.com/scopely/kinesis-vcr

Let me know if this works for you.

Thanks & Regards, Srivignesh KN

Srivignesh KN
  • 452
  • 8
  • 22