Questions tagged [aws-data-pipeline]

Use amazon-data-pipeline tag instead

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

80 questions
33
votes
1 answer

AWS Data Pipeline vs Step Functions

I am working on a problem where we intend to perform multiple transformations on data using EMR (SparkSQL). After going through the documentation of AWS Data Pipelines and AWS Step Functions, I am slightly confused as to what is the use-case each…
5
votes
2 answers

AWS Data Pipeline: Issue with permissions S3 Access for IAM role

I'm using the Load S3 data into RDS MySql table template in AWS Data Pipeline to import csv's from a S3 bucket into our RDS MySql. However I (as IAM user with full-admin rights) run into a warning I can't solve: Object:Ec2Instance - WARNING: Could…
5
votes
2 answers

Scheduling data extraction from AWS Redshift to S3

I am trying to build out a job for extracting data from Redshift and write the same data to S3 buckets. Till now I have explored AWS Glue, but Glue is not capable to run custom sql's on redshift. I know we can run unload commands and can be stored…
4
votes
1 answer

How to export an AWS DynamoDB table to an S3 Bucket?

I have a DynamoDB table that has 1.5 million records / 2GB. How to export this to an S3? The AWS data pipeline method to do this worked with a small table. But i am facing issues with exporting the 1.5 million record table to my S3. At my initial…
3
votes
2 answers

Is it possible to update and insert data in AWS Glue database using glue

So I am using AWS pyspark, and have gigabytes of data everyday, which is getting updated. I want to find the id of the data in an existing table in glue database, update if the id already exists and insert if the id does not exist. Is it possible to…
3
votes
2 answers

avoid run Install Task Runner step in EMR cluster

I hope you can help me. I am trying to create EMR cluster with hadoop and spark installed using datapipeline. The problem is this EMR is private so it does not have access to internet to download anything. In pipeline I indicate bootstrap actions to…
3
votes
0 answers

Is there an equivalent of the Azure Integration Runtime for AWS Data pipeline?

I have previously had successful implementations of data transfer from on-premise SQL Server instances to Azure SQL using the Integration Runtime component in conjunction with Azure Data Factory. I am not very familiar with AWS but from what I have…
3
votes
0 answers

Import file data from S3 into RDS with transformation steps

I'm a novice AWS user and I'm trying to solve a use case where I need to import data from a csv that is dropped into an S3 bucket to RDS. I have a csv file that will be uploaded to an S3 bucket, from there I want to run a custom Python script to…
Jackson
  • 6,391
  • 6
  • 32
  • 43
3
votes
1 answer

Permissions for creating and attaching EBS Volume to an EC2Resource i AWS Data Pipeline

I need more local disk than available to EC2Resources in an AWS Data Pipline. The simplest solution seems to be to create and attach an EBS volume. I have added EC2:CreateVolume og EC2:AttachVolume policies to both DataPipelineDefaultRole and…
2
votes
0 answers

AWS- DynamoDB table not created via EC2 using AWS pipeline (website in maintenance for DataPipeline service)

My EC2 instance created via my pipeline is not able to create my tableTest in dynamodb. I'm not able to get more info in the console since there website for AWS DataPipeline is in maitenance... My configuration pipeline definitions.json : { …
2
votes
1 answer

AWS data pipeline name tag option for EC2 resource

I'm running a shell activity in EC2 resource sample json for creating EC2 resource. { "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", …
2
votes
0 answers

Data Pipeline: Stop creating empty file in S3

I am using AWS data pipeline to take backup of RDS table data on certain condition and store that backup CSV file in S3 bucket. It's working fine when there is data to backup but when there is no data then also data pipeline is creating empty file…
Sachin
  • 2,912
  • 16
  • 25
2
votes
1 answer

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink? I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to…
1
vote
0 answers

How to passing parameter to aws glue workflow using lambda

I am trying to pass parameter to aws glue workflow using parameter from lambda trigger across all glue jobs in workflow. How to pass parameter to glue workflow using lambda I am able to pass parameter to glue jobs using lambda function but i wanted…
1
vote
0 answers

Unable to download pip and boto3 on AWS EC2 machine used in AWS data pipeline

I'm using Shell Command Activity that calls a Python script. This Python script utilizes boto3 to perform some functions. In the shell script in Shell Command Activity , I'm trying to install boto3 on to machine before calling my Python script. I'm…
1
2 3 4 5 6