Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

1 answer

AWS CLI moving file with wildcard (asterisk) in path

I am attempting to move a file, from on s3 location to another, using an activity in a AWS data pipeline. The command I am using is: (aws s3 mv s3://foobar/Tagger/out//*/lastImage.txt s3://foobar/Tagger/testInput/lastImage.txt) But I receive the…

amazon-web-services amazon-s3 amazon-data-pipeline

asked Jul 25 '15 at 18:06

BrainPermafrost

votes

4 answers

Exporting a AWS Postgres RDS Table to AWS S3

I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done? More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to…

postgresql amazon-web-services amazon-s3 amazon-rds amazon-data-pipeline

asked Oct 06 '16 at 14:51

Piyush Patil

14,512
6
35
54

votes

4 answers

Automatic AWS DynamoDB to S3 export failing with "role/DataPipelineDefaultRole is invalid"

Precisely following the step-by-step instructions on this page I am trying to export contents of one of my DynamoDB tables to an S3 bucket. I create a pipeline exactly as instructed but it fails to run. It seems that it has trouble…

export amazon-dynamodb amazon-emr amazon-iam amazon-data-pipeline

asked Mar 06 '15 at 20:21

I Z

5,719
19
53
100

votes

2 answers

How to use Data Pipeline to export a DynamoDB table that has on-demand provision

I used to use the Data Pipeline template called Export DynamoDB table to S3 to export a DynamoDB table to file. I recently updated all of my DynamoDB tables to have on-demand provision and the template no longer works. I'm pretty certain this is…

amazon-dynamodb amazon-data-pipeline

asked Feb 13 '19 at 09:35

F_SO_K

13,640
5
54
83

votes

1 answer

Truncate DynamoDb or rewrite data via Data Pipeline

There is possibility to dump DynamoDb via Data Pipeline and also import data in DynamoDb. Import is going well, but all the time data appends to already exists data in DynamoDb. For now I found work examples that scan DynamoDb and delete items one…

amazon-dynamodb truncate amazon-data-pipeline data-pipeline

asked Feb 17 '17 at 16:04

Vladimir Gilevich

votes

2 answers

Which Policy is needed for elasticmapreduce:RunJobFlow in AWS?

I'm using AWS DataPipeline to run an aws-cli command that creates an EMR Cluster, but I'm getting the following error when the command runs: user ... is not authorized to perform: elasticmapreduce:RunJobFlow I want to associate the right Policy to…

amazon-web-services amazon-iam amazon-emr amazon-data-pipeline

asked Jun 17 '16 at 17:26

cahen

15,807
13
47
78

votes

3 answers

Amazon Data Pipeline: How to use a script argument in a SqlActivity?

When trying to use a Script Argument in the sqlActivity: { "id" : "ActivityId_3zboU", "schedule" : { "ref" : "DefaultSchedule" }, "scriptUri" : "s3://location_of_script/unload.sql", "name" : "unload", "runsOn" : { "ref" : "Ec2Instance" }, …

amazon-web-services amazon-s3 amazon-redshift amazon-data-pipeline

asked Dec 15 '14 at 09:49

marnun

votes

1 answer

AWS Datapipeline RedShiftCopyActivity - how to specify "columns"

I am trying to copy a bunch of csv files from S3 to Redshift using the RedShiftCopyActivity and a datapipeline. This works fine as long as the csv structure matches the table structure. In my case the csv has less columns than the table and then…

amazon-web-services amazon-s3 amazon-redshift amazon-data-pipeline

asked Dec 04 '14 at 14:04

Peter

votes

1 answer

EMR activity stuck in Waiting_For_Runner state

I am creating a data pipeline to export dynamoDB table to S3 bucket.I used the standard template to use for this in data pipeline console. I ha verified that the runsOn field is set to the name of EMR cluster to be started. However, The EMR activity…

amazon-web-services emr amazon-data-pipeline

asked May 08 '14 at 07:21

user3610975

votes

4 answers

AWS Data pipeline CSV data from S3 to DynamoDB

I am trying to transfer CSV data from S3 bucket to DynamoDB using AWS pipeline, following is my pipe line script, it is not working properly, CSV file structure Name, Designation,Company A,TL,C1 B,Prog, C2 DynamoDb : N_Table, with Name as hash…

amazon-s3 amazon-dynamodb amazon-data-pipeline

asked Aug 03 '13 at 16:44

NKS

1,140
4
17
35

votes

2 answers

Is aws datapipeline service being deprecated?

When I navigate to aws datapipeline console it shows this banner, Please note that Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions. We plan to remove console access by 02/28/2023. Will aws…

amazon-web-services deprecation-warning amazon-data-pipeline

asked Dec 13 '22 at 09:22

Dinuka Salwathura

votes

3 answers

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely. However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard…

postgresql amazon-web-services amazon-redshift amazon-data-pipeline

asked Nov 06 '14 at 14:21

jenswirf

7,087
11
45
65

votes

1 answer

How to stop hive/pig install in Amazon Data Pipeline?

I don't need Hive or Pig, and Amazon Data Pipeline by default installs them on any EMR cluster it spins up. This makes testing take longer than it should. Any ideas on how to disable to install?

emr amazon-data-pipeline

asked Jan 17 '14 at 18:51

anvitron

votes

1 answer

AWS Data Pipeline stuck on Waiting For Runner

My goal is to copy a table in a postgreSQL database running on AWS RDS to a .csv file on Amazone S3. For this I use AWS data pipeline and found the following tutorial however when I follow all steps my pipeline is stuck at: "WAITING FOR RUNNER" see…

amazon-web-services amazon-data-pipeline

asked Jul 17 '18 at 09:06

Rutger Hofste

4,073
3
33
44

votes

1 answer

AWS EMR Spark: Error: Cannot load main class from JAR

I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with: Cannot load main class from JAR. The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step. On the…

apache-spark amazon-emr amazon-data-pipeline

asked Jan 23 '18 at 17:40

Atish

4,277
2
24
32

2 3

…

31 32 Next