3

I'm trying to redirect some data to .csv from an RDS instance. I have tried redirecting to an EC2 box we are using as the client and get a permissions error. I kind of understand this. Firstly is there a workaround for this? Or can I

SELECT * INTO OUTFILE S3:.... from table bob....

This is a process that will run weekly with dynamic parameters such as dates and database names. I currently use a MYSQL Stored Procedures executing dynamic SQL.

Thanks in advance.

Vijunav Vastivch
  • 4,153
  • 1
  • 16
  • 30
8adger
  • 67
  • 1
  • 2
  • 6

2 Answers2

8

If you are using the AWS Aurora flavour of RDS, you're in luck: https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-aurora-can-export-data-into-amazon-s3/


If you are using normal MySql with RDS I have a workaround that achieved to redirect some data to .csv. I used the aws data pipeline product with the CopyActivity - it can copy data from a SQL query that you specify to a S3 bucket that you specify.

Here is a extract from my pipeline definition yml that shows the input, output and CopyActivity:

{
  "myDescription": "Mysql datanode that represents the input database and query that gets the marketing table data.",

  "type": "MySqlDataNode",
  "name": "SourceRDSTable",
  "id": "SourceRDSTable",
  "connectionString": "jdbc:mysql://marketing.example.com:3306/schemaname",
  "table": "marketing",
  "selectQuery": "SELECT * FROM schemaname.marketing WHERE active=1 AND isOptedOut=0",
  "username": "mysqluser",
  "*password": "redacted"
},
{
  "myDescription": "S3 datanode that represents the S3 directory where the table data will be stored.",

  "type": "S3DataNode",
  "name": "S3OutputLocation",
  "id": "S3OutputLocation",
  "filePath": "s3://mys3bucket/output/marketing_dump_#{format(@scheduledStartTime, 'YYYY-MM-dd')}.csv"
},
{
  "myDescription": "CopyActivity used to dump a subset of the marketing table to S3 as csv",

  "type": "CopyActivity",
  "name": "Copy marketing table to S3",
  "id": "RDStoS3CopyActivity",
  "input": {
    "ref": "SourceRDSTable"
  },
  "output": {
    "ref": "S3OutputLocation"
  },
  "runsOn": {
    "ref": "ec2_worker_instance"
  }
}

(Although I'm happy with the results now that it's done, I have to admit it took me a week to set up this pipeline! It's been running daily for 4 months with no hitches but if I was to do it again, I would instead upgrde to Aurora and use their INTO OUTFILE S3 syntax)

Tom
  • 14,041
  • 16
  • 64
  • 80
  • Hi Tom, many thanks for taking the time to share your solution. I am looking to archive data from a handful of tables from hundreds of databases using different dates. My existing method calls stored procedures which form dynamic sql. The workflow is extract data to archive then delete the data. Someohow i need to get the data out before i delete it. Because of the dynamic nature of the queries i think it would make the data pipeline approach quite messy. Why can't they just add support for INTO OUTFILE? – 8adger Feb 28 '18 at 10:58
  • I think they want to sell Aurora by implementing useful features on Aurora and not so much on Mysql. In your case, Is it a possibility to do a aws api call to create a Aurora read-replica database from the original, then do the SELECT INTO OUTFILE using Aurora, then shut down the read-replica? – Tom Mar 01 '18 at 11:53
1

Select into S3, load from S3 is much slower than using select into Outfile and Load data infile. (close to twice slow, especially on large datasets).