0

I am loading the data from Amazon RDS(Aurora database) to Redshift using AWS Glue ETL. But I can't figure out how to do incremental loading(upsert)? Is there a way to create a filter/parameter on date while reading from source database to load only new/changed records to Redshift?

I see that S3 source works with AWS bookmarks but what is the option in case of relational database. your suggestions are greatly appreciated!! Thank you!

varun
  • 135
  • 3
  • 12

1 Answers1

0

I have used mysql connector as an external zip file uploaded to S3 and used in my AWS Glue python to embed Mysql queries and run INSERT INTO table...ON DUPLICATE KEY for upsert operations to RDS Aurora. If you are using an Mysql, you may follow the same for filtering result set based on a where clause. Please refer to this link :

Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • Hi Yuva, Thank you for the response! I am trying to load the data from Aurora database to Redshift using AWS Glue. Since Redshift doesn't enforce constraints, I can't use the SQL you mentioned. I am looking for a way to filter the data (to get only yesterday's incremental data from Aurora database) and load it to Redshift staging table using SPARK python code in AWS Glue. Please let me know if you have any suggestions. Thank you! – varun Jan 24 '19 at 03:29