I made a code to upload the files to S3 using boto3. The code runs in docker using cron job.
Initially I've set the aws credentials in the Dockerfile using ENV
, and later switch to binding /home/$USER/.aws/
to the container to /root/.aws/
.
FROM python:3.7-alpine
WORKDIR /scripts
RUN pip install boto3
# ENV AWS_ACCESS_KEY_ID=
# ENV AWS_SECRET_ACCESS_KEY=
COPY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root
COPY s3_upload.py /scripts/s3_upload
RUN chmod a+x /scripts/s3_upload
RUN mkdir /root/info/
RUN touch /root/info/max_mod_time.json
RUN touch /root/info/error.log
RUN mkdir /root/.aws/
RUN touch /root/.aws/credentials
# RUN touch /root/.aws/config
ENTRYPOINT crond -f
version: '3.8'
services:
s3-data-transfer:
image: ap-aws-s3-file-upload
build:
context: ./
volumes:
- ../data/features:/data
- ./info:/root/info
- ~/.aws/credentials:/root/.aws/credentials
# - ~/.aws/config:/root/.aws/config
At this point the code is using my credentials (AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY) for authentication and works perfectly.
I'm trying to switch the authentication to IAM roles. I've created a role in AWS called Upload_Data_To_S3
with the AmazonS3FullAccess
policy.
I'm reading the docs on how to set up boto3 for IAM roles. I've set my ~/.aws/config/
as follows
[default]
region=ca-central-1
[profile crossaccount]
role_arn=arn:aws:iam::#######:role/Upload_Data_To_S3
source_profile=
I don't have aws cli installed so no profile, besides my user on aws account. My python code contains no code to do with authentication.
#!/usr/local/bin/python3
import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time
# TODO: look into getting credentials from IAM role
s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'
max_mod_time = '0'
file_list = glob.glob('/data/*.json') # get a list of feature files
file_mod_time = None
# get mod time for all files in data directory
file_info = [{'file': file, 'mod_time': time.strftime(
'%Y-%m-%d %H:%M:%S', time.gmtime(os.path.getmtime(file)))} for file in file_list]
# sort files my mod time (min -> max)
timestamp_sorted_file_info = sorted(file_info, key=lambda f: f['mod_time'])
# print('File Info Sorted by Time Stamp:\n',timestamp_sorted_file_info)
# check if the file exists and not empty -> set max_mod_time from it
if os.path.exists('/root/info/max_mod_time.json') and os.stat('/root/info/max_mod_time.json').st_size != 0:
with open('/root/info/max_mod_time.json', 'r') as mtime:
max_mod_time = json.load(mtime)['max_mod_time']
# upload the files to s3
mod_time_last_upload = "0"
for file in timestamp_sorted_file_info:
file_mod_time = file['mod_time'] # set mod time for the current file
# file_mod_time = '2020-09-19 13:28:53' # for debugging
file_name = os.path.basename(file['file']) # get file name from file path
if file_mod_time > max_mod_time: # compare current file mod_time to max_mod_time from previous run
with open(os.path.join('/data/', file_name), "rb") as f:
s3_client.upload_fileobj(f, s3_bucket_name, file_name)
# error check - https://stackoverflow.com/a/38376288/7582937
# check if the file upload was successful
try:
s3_client.head_object(Bucket=s3_bucket_name, Key=file_name)
mod_time_last_upload = file_mod_time
print(file_name, ' is UPLOADED')
except ClientError as error:
# Not found
if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
# save error to log file
open('/root/info/error.log', 'w').write(str(error))
print("error: ", error)
break
print('File Mod Time: ', file_mod_time)
print('Mod Time Last Upload: ', mod_time_last_upload)
# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
# create JSON object to write to the file
object_to_write = json.dumps(
{"max_mod_time": mod_time_last_upload})
# write max_mod_time to the file to be passed to the next run
if mod_time_last_upload is not "0":
if object_to_write:
open('/root/info/max_mod_time.json', 'w').write(str(object_to_write))
When I build and run the container I get the following error:
Traceback (most recent call last):
File "/scripts/s3_upload", line 40, in <module>
s3_client.upload_fileobj(f, s3_bucket_name, file_name)
File "/usr/local/lib/python3.7/site-packages/boto3/s3/inject.py", line 539, in upload_fileobj
return future.result()
File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/usr/local/lib/python3.7/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 337, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 643, in _make_api_call
operation_model, request_dict, request_context)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 662, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/usr/local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
That's understandable since I don't have the credentials in the container. What do I need to add to the code or the ~/.aws/config
file for it to use the IAM role I've set up? Unfortunately the docs aren't very clear in this regard.
Thanks in advance.