3

I have an AWS account with a Postgres RDS database that represents the production environment for an app. We have another team that is building an analytics infrastructure in a different AWS account. They need to be able to pull data from our production database to hydrate their reports.

From my research so far, it seems there are a couple options:

  1. Create a bash script that runs on a CRON schedule that uses pg_dump and pg_restore and stash that on an EC2 instance in one of the accounts.

  2. Automate the process of creating a Snapshot on a schedule and then ship that to the other accounts S3 bucket. Then create a Lambda (or other script) that triggers when the snapshot is placed in the S3 bucket and restore it. Downside to this is we'd have to create a new RDS instance with each restore (since you can't restore a Snapshot to an existing instance), which changes the FQDN of the database (which we can mitigate using Route53 and a CNAME that gets updated, but this is complicated).

  3. Create a read-replica in the origin AWS account and open up security for that instance so they can just access it directly (but then my account is responsible for all the costs associated with hosting and accessing it).

None of these seem like good options. Is there some other way to accomplish this?

user2719094
  • 1,611
  • 5
  • 26
  • 36

1 Answers1

2

I would suggest to use AWS Data Migration Service It can listen to changes on your source database and stream them to a target (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html)

There is also a third-party blog post explaining how to set this up https://medium.com/tensult/cross-account-and-cross-region-rds-mysql-db-replication-part-1-55d307c7ae65

Pricing is per hour, depending on the size of the replication EC2 instance. It runs in the target account, so it will not be on your cost center.

Sébastien Stormacq
  • 14,301
  • 5
  • 41
  • 64
  • 1
    The medium post is for MySQL, it won't work for Postgres on RDS. With Postgres on RDS you can only perform logical replication across databases (outside of accounts/rds) – maxTrialfire Jul 24 '19 at 22:49
  • agree on the blog post, but AWS Data Migration Service does support continuous streaming from an Amazon RDS Postgres datasource, so the same architecture is possible (although I did not try it personaly). For PostgreSQL, AWS DMS sets up logical replication slots and uses the test_decoding plugin to read changes from the source and migrate them to the target. – Sébastien Stormacq Jul 24 '19 at 23:47
  • @SébastienStormacq thanks! this looks really promising, but I can't get it to work. My replication task just keeps failing and no log streams are getting created in CloudWatch. I can tell that it can connect to the source/destination fine, but when I run the task it just says "Last Error Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2673] [1020482] Stop Reason FATAL_ERROR Error Level FATAL" . Any ideas? – user2719094 Jul 25 '19 at 14:52
  • 2
    Logical replication, however has many limitations. It's a pub-sub model that ONLY streams DML changes. So if any changes on the master schema occur those changes do not replicate to the slaves. – maxTrialfire Jul 25 '19 at 14:54
  • 1
    I've tried to use DMS for that purpose and wasn't very happy. Turns out it drops schema defaults such as serial ids during the migration. Same issue described here https://stackoverflow.com/questions/55633029/postgresql-primary-key-sequence-lost-after-migration-using-aws-dms There is a hack with schema conversion tool but honestly sharing pg_dump seems like less of a hassle – thepolina Apr 30 '20 at 06:46