To achieve this I would like to suggest more details to your current flow:
In the submission script:
- Upload/Refresh any dependencies on the S3 bucket.
- Launch an EC2 instance.
In the EC2 instance:
- Download dependencies.
- Do work.
- Upload the results to S3.
- Terminate instance.
There are 2 simple ways to run commands on an EC2 instance, SSH or use the user-data attribute.
For simplicity, and for your current use case, I would recommend using the user-data method.
First, you need to create an EC2-InstanceProfile with permissions to download/upload to the S3 bucket.
Then you can create an EC2, install any python or pip packages and register it as an AMI.
Here is some reference code:
Note this code is in python3 and suitable only for Windows machines.
submission.py:
import boto3
s3_client = boto3.client('s3')
ec2 = boto3.resource('ec2')
deps = {
'remote' : [
"/path/to/s3-bucket/obj.txt"
],
'local' : [
"/path/to/local-directory/obj.txt"
]
}
for remote, local in zip(deps['remote'], deps['local']):
s3_client.upload_file(local, bucket_name, remote)
user_data = f"""<powershell>
cd {path_to_instance_worker_dir}; python {path_to_instance_worker_script}
</powershell>
"""
instance = ec2.create_instances(
MinCount=1,
MaxCount=1,
ImageId=image_id,
InstanceType=your_ec2_type,
KeyName=your_key_name,
IamInstanceProfile={
'Name': instance_profile_name
},
SecurityGroupIds=[
instance_security_group,
],
UserData=user_data
)
instance_worker:
import boto3
s3_client = boto3.client('s3')
deps = {
'remote' : [
"/path/to/s3-bucket/obj.txt"
],
'local' : [
"/path/to/local-directory/obj.txt"
]
}
for remote, local in zip(deps['remote'], deps['local']):
s3_client.download_file(bucket_name, remote, local)
result = do_work()
# write results to file
s3_client.upload_file(result_file, bucket_name, result_remote)
# Get the instance ID from inside (This is only for Windows machines)
p = subprocess.Popen(["powershell.exe", "(Invoke-WebRequest -Uri 'http://169.254.169.254/latest/meta-data/instance-id').Content"])
out = p.communicate()[0]
instance_id = str(out.strip().decode('ascii'))
ec2_client.terminate_instances(InstanceIds=[instance_id, ])
In this code, I terminate the instance from within, in order to do that you must first obtain the instnace_id, have a look here for more references.
Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?
In theory, you can use the user data to run any scripts or CLI commands you would like, including installing python and pip dependencies, but if it's too complicated/heavy to install, I would suggest you build an image and launch from it, as mentioned before.