1

I have been creating a new RDS table every day that sometimes reaches about 10 GB in that day. My goal is to dump this table to S3 more efficiently.

My current method of doing this is:

  1. SELECT * FROM table;

  2. Dump those rows to a CSV

  3. Upload that CSV to S3

  4. Delete the CSV file

Is there a way that I am able to dump the RDS table to S3 directly instead of generating that CSV locally and then dumping that to S3?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
kaybuzz
  • 97
  • 1
  • 11
  • Possible duplicate of [Exporting a AWS Postgres RDS Table to AWS S3](https://stackoverflow.com/questions/39899210/exporting-a-aws-postgres-rds-table-to-aws-s3) – Nick Walsh May 10 '19 at 01:38
  • Tried data pipeline prior to posting this (forgot to mention that). Problem is that CopyActivity to S3 does not support files larger than 4 GB when moving to S3. Data pipeline would have been an appropriate solution if it weren't for the S3 maximum limit upload and CopyActivity not supporting automatic chunking for S3 targets. In my above method, I chunked the files when needed. – kaybuzz May 16 '19 at 23:53

1 Answers1

4

Check out the Database Migration Service of Amazon.

https://aws.amazon.com/dms/

I found creating a task for dumping RDS to S3 to be fairly straight forward in the console and this guide should applicable to you even those it discusses Aurora: https://aws.amazon.com/blogs/database/replicate-data-from-amazon-aurora-to-amazon-s3-with-aws-database-migration-service/

I have a lambda that is scheduled to run on a periodic basis that kicks off a DMS task to dump a few tables of a database.

The code for the lambda (or just a manual job) is really simple (Python 3.6):

import boto3
import os

DMS_TASK_ARN = os.environ.get("DMS_TASK_ARN")
MAX_RETRY = int(os.environ.get("MAX_RETRY", 600))
WAIT_TIME = int(os.environ.get("WAIT_TIME", 10))

dms = boto3.client('dms')

def lambda_handler(event, context):
    # Start the replication task
    print(f"Starting replication task {DMS_TASK_ARN}")
    dms.start_replication_task(
        ReplicationTaskArn=DMS_TASK_ARN,
        StartReplicationTaskType='reload-target'
    )
JD D
  • 7,398
  • 2
  • 34
  • 53