Highest Voted 'aws-data-pipeline' Questions

33

votes

1 answer

AWS Data Pipeline vs Step Functions

I am working on a problem where we intend to perform multiple transformations on data using EMR (SparkSQL). After going through the documentation of AWS Data Pipelines and AWS Step Functions, I am slightly confused as to what is the use-case each…

amazon-web-services aws-step-functions aws-data-pipeline

asked Mar 08 '19 at 10:50

archilius

621
6
16

5

votes

2 answers

AWS Data Pipeline: Issue with permissions S3 Access for IAM role

I'm using the Load S3 data into RDS MySql table template in AWS Data Pipeline to import csv's from a S3 bucket into our RDS MySql. However I (as IAM user with full-admin rights) run into a warning I can't solve: Object:Ec2Instance - WARNING: Could…

amazon-web-services amazon-s3 amazon-ec2 amazon-data-pipeline aws-data-pipeline

asked Feb 01 '19 at 09:27

jeroen

51
1
3

5

votes

2 answers

Scheduling data extraction from AWS Redshift to S3

I am trying to build out a job for extracting data from Redshift and write the same data to S3 buckets. Till now I have explored AWS Glue, but Glue is not capable to run custom sql's on redshift. I know we can run unload commands and can be stored…

amazon-web-services amazon-s3 amazon-redshift aws-glue aws-data-pipeline

asked Nov 15 '17 at 10:23

Spark-Beginner

1,334
5
17
24

4

votes

1 answer

How to export an AWS DynamoDB table to an S3 Bucket?

I have a DynamoDB table that has 1.5 million records / 2GB. How to export this to an S3? The AWS data pipeline method to do this worked with a small table. But i am facing issues with exporting the 1.5 million record table to my S3. At my initial…

amazon-web-services amazon-s3 amazon-dynamodb amazon-emr aws-data-pipeline

asked Aug 29 '20 at 12:54

Afnas

79
5

3

votes

2 answers

Is it possible to update and insert data in AWS Glue database using glue

So I am using AWS pyspark, and have gigabytes of data everyday, which is getting updated. I want to find the id of the data in an existing table in glue database, update if the id already exists and insert if the id does not exist. Is it possible to…

amazon-web-services aws-glue aws-data-pipeline

asked May 08 '21 at 02:34

Paras Pandey

37
1
4

3

votes

2 answers

avoid run Install Task Runner step in EMR cluster

I hope you can help me. I am trying to create EMR cluster with hadoop and spark installed using datapipeline. The problem is this EMR is private so it does not have access to internet to download anything. In pipeline I indicate bootstrap actions to…

amazon-web-services hadoop hive amazon-emr aws-data-pipeline

asked May 07 '21 at 07:52

vll1990

311
3
17

3

votes

0 answers

Is there an equivalent of the Azure Integration Runtime for AWS Data pipeline?

I have previously had successful implementations of data transfer from on-premise SQL Server instances to Azure SQL using the Integration Runtime component in conjunction with Azure Data Factory. I am not very familiar with AWS but from what I have…

amazon-web-services azure azure-data-factory aws-data-pipeline

asked Jan 12 '21 at 06:47

CelticGiz

31
1

3

votes

0 answers

Import file data from S3 into RDS with transformation steps

I'm a novice AWS user and I'm trying to solve a use case where I need to import data from a csv that is dropped into an S3 bucket to RDS. I have a csv file that will be uploaded to an S3 bucket, from there I want to run a custom Python script to…

python amazon-web-services etl aws-glue aws-data-pipeline

asked Nov 17 '18 at 21:20

Jackson

6,391
6
32
43

3

votes

1 answer

Permissions for creating and attaching EBS Volume to an EC2Resource i AWS Data Pipeline

I need more local disk than available to EC2Resources in an AWS Data Pipline. The simplest solution seems to be to create and attach an EBS volume. I have added EC2:CreateVolume og EC2:AttachVolume policies to both DataPipelineDefaultRole and…

amazon-web-services amazon-iam aws-data-pipeline

asked Nov 07 '18 at 13:40

Knut Hellan

71
1
8

2

votes

0 answers

AWS- DynamoDB table not created via EC2 using AWS pipeline (website in maintenance for DataPipeline service)

My EC2 instance created via my pipeline is not able to create my tableTest in dynamodb. I'm not able to get more info in the console since there website for AWS DataPipeline is in maitenance... My configuration pipeline definitions.json : { …

amazon-web-services amazon-ec2 amazon-dynamodb pipeline aws-data-pipeline

asked Aug 25 '23 at 16:52

Zokulko

211
4
25

2

votes

1 answer

AWS data pipeline name tag option for EC2 resource

I'm running a shell activity in EC2 resource sample json for creating EC2 resource. { "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", …

amazon-web-services amazon-ec2 amazon-data-pipeline aws-data-pipeline

asked Jun 03 '20 at 12:22

Dev

413
10
27

2

votes

0 answers

Data Pipeline: Stop creating empty file in S3

I am using AWS data pipeline to take backup of RDS table data on certain condition and store that backup CSV file in S3 bucket. It's working fine when there is data to backup but when there is no data then also data pipeline is creating empty file…

amazon-web-services amazon-s3 amazon-rds aws-data-pipeline

asked May 24 '19 at 08:57

Sachin

2,912
16
25

2

votes

1 answer

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink? I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to…

aws-lambda amazon-dynamodb aws-glue aws-batch aws-data-pipeline

asked Apr 05 '19 at 16:23

Matthew Pope

7,212
1
28
49

1

vote

0 answers

How to passing parameter to aws glue workflow using lambda

I am trying to pass parameter to aws glue workflow using parameter from lambda trigger across all glue jobs in workflow. How to pass parameter to glue workflow using lambda I am able to pass parameter to glue jobs using lambda function but i wanted…

aws-glue aws-data-pipeline aws-pipeline

asked Nov 09 '22 at 16:09

Rakesh Badal

11
1

1

vote

0 answers

Unable to download pip and boto3 on AWS EC2 machine used in AWS data pipeline

I'm using Shell Command Activity that calls a Python script. This Python script utilizes boto3 to perform some functions. In the shell script in Shell Command Activity , I'm trying to install boto3 on to machine before calling my Python script. I'm…

amazon-web-services amazon-ec2 pip boto3 aws-data-pipeline

asked Sep 03 '21 at 01:09

N01

39
5

Questions tagged [aws-data-pipeline]