0

I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity.

I have been following the guide found here step by step: https://medium.com/@SarwatFatimaM/data-scientists-guide-setting-up-aws-datapipeline-for-running-python-etl-scripts-using-c6c8fa4de70d

The primary issue I am running into is that the pipeline will hang on the WAITING_FOR_RUNNER Status. I have confirmed that my python script and .bat (had to change from .sh as I am using a windows ec2) run inside of the desired Ec2 instance. However, from what I can tell the issue is a result of the warning I am receiving from inside the Datapipline Architect:

Errors/Warnings
Object:DefaultResource1
WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.

I have tried editing the IAM roles such that DataPipelineDefaultRole and DataPipelineDefaultResourceRole both have access to AmazonEc2FullAccess, AmazonS3FullAccess, AWSDataPipelineRole, AWSDataPipeline_FullAccess policies as well as trying the suggested inline policies shown here: AWS Data Pipeline: Issue with permissions S3 Access for IAM role and here https://forums.aws.amazon.com/thread.jspa?threadID=241048.

I have let these policies sit for hours and I have rebuilt the pipeline a few times but I still keep getting that specific warning. Do you have any ideas?

WolVes
  • 1,286
  • 2
  • 19
  • 39

1 Answers1

0

As per the AWS Data Pipeline documentation on AWS found below, the custom AMI must have Linux installed. This, therefore, cannot be completed currently on a Windows EC2 and must be completed on a Linux EC2.

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-ami.html

WolVes
  • 1,286
  • 2
  • 19
  • 39