I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity.
I have been following the guide found here step by step: https://medium.com/@SarwatFatimaM/data-scientists-guide-setting-up-aws-datapipeline-for-running-python-etl-scripts-using-c6c8fa4de70d
The primary issue I am running into is that the pipeline will hang on the WAITING_FOR_RUNNER
Status.
I have confirmed that my python script and .bat (had to change from .sh as I am using a windows ec2) run inside of the desired Ec2 instance. However, from what I can tell the issue is a result of the warning I am receiving from inside the Datapipline Architect:
Errors/Warnings
Object:DefaultResource1
WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.
I have tried editing the IAM roles such that DataPipelineDefaultRole and DataPipelineDefaultResourceRole both have access to AmazonEc2FullAccess, AmazonS3FullAccess, AWSDataPipelineRole, AWSDataPipeline_FullAccess policies as well as trying the suggested inline policies shown here: AWS Data Pipeline: Issue with permissions S3 Access for IAM role and here https://forums.aws.amazon.com/thread.jspa?threadID=241048.
I have let these policies sit for hours and I have rebuilt the pipeline a few times but I still keep getting that specific warning. Do you have any ideas?