I am working with an AWS Data Pipeline that has a ShellCommandActivity that sets the script uri to bash file located in a s3 bucket. The bash file copies a python script located in the same s3 bucket to a EmrCluster and then the script tries to execute that python script.
This is my pipeline export:
{
"objects": [
{
"name": "DefaultResource1",
"id": "ResourceId_27dLM",
"amiVersion": "3.9.0",
"type": "EmrCluster",
"region": "us-east-1"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://project/bin/scripts/logs/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"stage": "true",
"scriptUri": "s3://project/bin/scripts/RunPython.sh",
"name": "DefaultShellCommandActivity1",
"id": "ShellCommandActivityId_hA57k",
"runsOn": {
"ref": "ResourceId_27dLM"
},
"type": "ShellCommandActivity"
}
],
"parameters": []
}
This is RunPython.sh:
#!/usr/bin/env bash
aws s3 cp s3://project/bin/scripts/Test.py ./
python ./Test.py
This is Test.py
__author__ = 'MrRobot'
import re
import os
import sys
import boto3
print "We've entered the python file"
From the Stdout Log I get:
download: s3://project/bin/scripts/Test.py to ./
From the Stdeer Log I get:
python: can't open file 'Test.py': [Errno 2] No such file or directory
I have also tried replacing python ./Test.py with python Test.py, but I get the same result.
How do I get my AWS Data Pipeline to execute my Test.py script.
EDIT
When I set scriptUri to s3://project/bin/scripts/Test.py I get the following errors :
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 1: author: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 2: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 3: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 4: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 5: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 7: print: command not found
EDIT 2
Added the following line to Test.py
#!/usr/bin/env python
Then I received the following error:
error: line 6, in import boto3 ImportError: No module named boto3
using @franklinsijo 's advice I created a Bootstrap Action on the EmrCluster with the following value:
s3://project/bin/scripts/BootstrapActions.sh
This is BootstrapActions.sh
#!/usr/bin/env bash
sudo pip install boto3
This worked!!!!!!!