I want to track username changes overtime.
I have the following users table in Redshift:
id username valid_from valid_to current
--------------------------------------------------------
1 joe1 2015-01-01 2015-01-15 No
1 joe2 2015-01-15 NULL Yes
My source data is from RDS Postgres. I'm thinking of several options on how to handle this:
1) Create users_history table and start tracking this inside RDS Postgres db. This requires me making changes to my app and this table potentially can get huge
2) Have an ETL process and query the users source table like every 5 minutes looking for new changes (sort by last updated_at) and dump it to DynamoDB.
3) Have an ETL process dumping data to S3, then COPY it into a temporary table inside Redshift and doing query update there
Can you help advice what is scalable and easily maintainable in the long run? Remember these tables can be massive and I'll be tracking SCD for many tables.
Thanks.
Update 1: I chatted with AWS support and they showed me this, seems like a good solution: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html