I want to use AWS data pipeline service to pipe data from an Oracle RDS database to s3 and then on to Glacier. Could someone please tell me how to achieve that.
-
Possible duplicate of [Export RDS data to S3/Glacier](http://stackoverflow.com/questions/34482577/export-rds-data-to-s3-glacier) – Mark B Dec 29 '15 at 15:18
4 Answers
You can setup a AWS DataPipeline to copy to do an incremental daily copy of RDS Oracle table to S3. Once you move it to S3 bucket, you can archive it to Glacier. https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/
You need to upload the Oracle jdbc driver downloaded from http://www.oracle.com/technetwork/database/features/jdbc/jdbc-drivers-12c-download-1958347.html to a S3 bucket location and specify the s3 path using the jdbcDriverJarUri field.
See this http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-rdsdatabase.html.
This is a sample DataPipeline template that schedules an Amazon EC2 instance to do an incremental data copy from an Amazon RDS Oracle table to Amazon S3. The RDS Oracle table must have a column that stores the last modified time value. This template will copy changes that are made to the table between scheduled intervals starting from the scheduled start time. Physical deletes to the table will not be copied. The output will be written as a CSV file in a timestamped subfolder under the output S3 folder.
{
"metadata": {
"templateName": "Incremental copy of RDS Oracle table to S3",
"templateDescription": "Incremental copy of RDS Oracle table to S3"
},
"objects":[
{
"name": "DailySchedule",
"id": "DailySchedule",
"startAt" : "FIRST_ACTIVATION_DATE_TIME",
"period": "1 hour",
"type": "Schedule"
},
{
"id": "Default",
"name": "Default",
"schedule": {
"ref": "DailySchedule"
},
"failureAndRerunMode": "CASCADE",
"role": "DataPipelineDefaultRole",
"resourceRole": "DataPipelineDefaultResourceRole"
},
{
"name":"SourceRDSTable",
"id":"SourceRDSTable",
"type":"RdsDatabase",
"table":"#{myRDSTableName}",
"username":"#{myRDSUsername}",
"*password":"#{*myRDSPassword}",
"jdbcDriverJarUri" : "#{myOracleJdbcDriverUri}",
"rdsInstanceId":"#{myRDSInstanceId}",
"scheduleType": "TIMESERIES",
"selectQuery":"select * from #{table} where #{myRDSTableLastModifiedCol} >= '#{format(@scheduledStartTime, 'YYYY-MM-dd HH-mm-ss')}' and #{myRDSTableLastModifiedCol} <= '#{format(@scheduledEndTime, 'YYYY-MM-dd HH-mm-ss')}'"
},
{
"name":"DestinationS3Location",
"id":"DestinationS3Location",
"type":"S3DataNode",
"scheduleType": "TIMESERIES",
"directoryPath":"#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
},
{
"name":"RDSToS3CopyActivity",
"id":"RDSToS3CopyActivity",
"type":"CopyActivity",
"scheduleType": "TIMESERIES",
"input":{
"ref":"SourceRDSTable"
},
"output":{
"ref":"DestinationS3Location"
},
"runsOn":{
"ref":"Ec2Instance"
}
},
{
"name":"Ec2Instance",
"id":"Ec2Instance",
"type":"Ec2Resource",
"scheduleType": "TIMESERIES",
"instanceType":"#{myEC2InstanceType}",
"securityGroups":"#{myEc2RdsSecurityGrps}",
"terminateAfter":"2 hours",
"actionOnTaskFailure":"terminate"
}
],
"parameters":[
{
"id":"myRDSInstanceId",
"type":"String",
"description":"RDS Oracle my_db_instance_identifier"
},
{
"id":"myOracleJdbcDriverUri",
"type":"String",
"description":"S3 path of Oracle Jdbc Driver."
},
{
"id":"myRDSUsername",
"type":"String",
"description":"RDS username"
},
{
"id":"*myRDSPassword",
"type":"String",
"description":"RDS password"
},
{
"id":"myRDSTableName",
"type":"String",
"description":"RDS table name"
},
{
"id": "myEc2RdsSecurityGrps",
"type":"String",
"isArray": "true",
"description": "RDS security group(s)",
"optional" :"true",
"helpText" :"The names of one or more EC2 security groups that have access to the RDS cluster.",
"watermark": "security group name"
},
{
"id":"myRDSTableLastModifiedCol",
"type":"String",
"description":"Last modified column name",
"helpText": "Name of the column that stores the last modified time value in the RDS table."
},
{
"id":"myEC2InstanceType",
"type":"String",
"default":"t1.micro",
"description":"EC2 instance type",
"helpText": "The type of the EC2 instance that will be launched on your behalf to do the copy"
},
{
"id":"myOutputS3Loc",
"type":"AWS::S3::ObjectKey",
"description":"Output S3 folder"
}
]
}

- 4,568
- 29
- 40

- 677
- 4
- 11
-
-
@AravindR Can we do from S3(csv file) to Oracle Table(insert) also? I am not able to find any template for it. – Nimmo Jun 18 '20 at 09:51
Once you have the data in S3, specifically Standard Storage, you can then use S3 Lifecycle policies to transition objects to lower cost storage options.
For more information on this you can see the documentation here: http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
Cloudtechnician - Alternatively you may also open a case with AWS. We would like to hear if the documentation above sufficient and if not how we can clarify it to better help you.
Please identify yourself with this stackoverflow URL, so that we have a reference.
-
Yes, I had a word with AWS support team. They're also suggesting DataPipeline or Lamda. Currently, this activity is on hold. I'm expecting to implement DataPipeline template shared by AravindR by the end of this month. I'll update this thread. Thank you so much guys :) – cloudtechnician Jan 14 '16 at 14:15
For ad-hoc full table data dump to S3 (not incremental) you can spawn EC2 Windows instance and run Oracle_To_S3_Data_Uploader on it.
It's low touch. You have to provide it SQL query and target S3 bucket name.
Data will be compressed and streamed to S3 using AWS multipart upload protocol.

- 2,165
- 2
- 27
- 37