8

I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely.

However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?

"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”
jenswirf
  • 7,087
  • 11
  • 45
  • 65

3 Answers3

14

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:

  1. Create a data node of the type SqlDataNode. Specify table name and select query
  2. Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
  3. Create a data node of the type S3DataNode
  4. Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output
PeterssonJesper
  • 211
  • 2
  • 2
5

this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.

something like this:

{
  "id": "DumpCommand",
  "type": "ShellCommandActivity",
  "runsOn": { "ref": "MyEC2Resource" },
  "stage": "true",
  "output": { "ref": "S3ForRedshiftDataNode" },
  "command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}

i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.

  • pros: super straightforward and requires no additional script files to upload to s3
  • cons: not exactly secure. your db password will be transmitted over the wire without encryption.

look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.

xgess
  • 114
  • 3
  • 1
    As a quick update: Postgresql now supports SSL so if you're on a newer version make sure to set ssl=true so encryption is setup. – Robert Mennell Jul 25 '16 at 17:30
1

AWS now allow partners to do near real time RDS -> Redshift inserts.

https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/

Manuel G
  • 1,523
  • 1
  • 21
  • 34
  • 3
    As of 2/25/2015 the above link only supports MySQL -> Redshift and doesnt have support for Postgres, which OP is asking about. – Cody Caughlan Feb 25 '15 at 18:43
  • 2
    Looks like it's been updated since then. From the link: _All four data integration solutions discussed above can be used with all RDS database engines (MySQL, SQL Server, PostgreSQL, and Oracle)._ – ryan Aug 17 '15 at 23:26