I saw that similar questions already exist:
Copying only new records from AWS DynamoDB to AWS Redshift
Loading data from Amazon dynamoDB to redshift
Unfortunately most of them are outdated (since amazon introduced new services) and/or have different answers.
In my case I have two databases (RedShift and DynamoDB) and I have to:
- Keep RedShift database up-to-date
- Store database backup on S3
To do that I want to use that approach:
- Backup only new/modified records from DynamoDB to S3 at the end of the day. (1 file per day)
- Update RedShift database using file from S3
So my question is what is the most efficient way to do that?
I read this tutorial but I am not sure that AWS Data Pipeline could be configured to "catch" only new records from DynamoDB. If that is not possible, scanning entire database every time is not an option.
Thank you in advance!