0

I made a code to upload the files to S3 using boto3. The code runs in docker using cron job. Initially I've set the aws credentials in the Dockerfile using ENV, and later switch to binding /home/$USER/.aws/ to the container to /root/.aws/.

FROM python:3.7-alpine

WORKDIR /scripts

RUN pip install boto3

# ENV AWS_ACCESS_KEY_ID=
# ENV AWS_SECRET_ACCESS_KEY=

COPY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root

COPY s3_upload.py /scripts/s3_upload
RUN chmod a+x /scripts/s3_upload

RUN mkdir /root/info/
RUN touch /root/info/max_mod_time.json
RUN touch /root/info/error.log

RUN mkdir /root/.aws/
RUN touch /root/.aws/credentials
# RUN touch /root/.aws/config


ENTRYPOINT crond -f
version: '3.8'
services:
  s3-data-transfer:
    image: ap-aws-s3-file-upload 
    build:
      context: ./
    volumes:
      - ../data/features:/data
      - ./info:/root/info
      - ~/.aws/credentials:/root/.aws/credentials
      # - ~/.aws/config:/root/.aws/config

At this point the code is using my credentials (AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY) for authentication and works perfectly.

I'm trying to switch the authentication to IAM roles. I've created a role in AWS called Upload_Data_To_S3 with the AmazonS3FullAccess policy.

I'm reading the docs on how to set up boto3 for IAM roles. I've set my ~/.aws/config/ as follows

[default]
region=ca-central-1

[profile crossaccount]
role_arn=arn:aws:iam::#######:role/Upload_Data_To_S3
source_profile=

I don't have aws cli installed so no profile, besides my user on aws account. My python code contains no code to do with authentication.

#!/usr/local/bin/python3

import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time

# TODO: look into getting credentials from IAM role
s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'

max_mod_time = '0'
file_list = glob.glob('/data/*.json')  # get a list of feature files
file_mod_time = None

# get mod time for all files in data directory
file_info = [{'file': file, 'mod_time': time.strftime(
    '%Y-%m-%d %H:%M:%S', time.gmtime(os.path.getmtime(file)))} for file in file_list]

# sort files my mod time (min -> max)
timestamp_sorted_file_info = sorted(file_info, key=lambda f: f['mod_time'])
# print('File Info Sorted by Time Stamp:\n',timestamp_sorted_file_info)

# check if the file exists and not empty -> set max_mod_time from it
if os.path.exists('/root/info/max_mod_time.json') and os.stat('/root/info/max_mod_time.json').st_size != 0:
    with open('/root/info/max_mod_time.json', 'r') as mtime:
        max_mod_time = json.load(mtime)['max_mod_time']

# upload the files to s3
mod_time_last_upload = "0"
for file in timestamp_sorted_file_info:
    file_mod_time = file['mod_time']  # set mod time for the current file
    # file_mod_time = '2020-09-19 13:28:53' # for debugging
    file_name = os.path.basename(file['file'])  # get file name from file path

    if file_mod_time > max_mod_time:  # compare current file mod_time to max_mod_time from previous run
        with open(os.path.join('/data/', file_name), "rb") as f:
            s3_client.upload_fileobj(f, s3_bucket_name, file_name)

            # error check - https://stackoverflow.com/a/38376288/7582937
            # check if the file upload was successful
            try:
                s3_client.head_object(Bucket=s3_bucket_name, Key=file_name)
                mod_time_last_upload = file_mod_time
                print(file_name, ' is UPLOADED')
            except ClientError as error:
                # Not found
                if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
                    # save error to log file
                    open('/root/info/error.log', 'w').write(str(error))
                    print("error: ", error)
                break

        print('File Mod Time: ', file_mod_time)
        print('Mod Time Last Upload: ', mod_time_last_upload)


# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
# create JSON object to write to the file
object_to_write = json.dumps(
    {"max_mod_time": mod_time_last_upload})

# write max_mod_time to the file to be passed to the next run
if mod_time_last_upload is not "0":
    if object_to_write:
        open('/root/info/max_mod_time.json', 'w').write(str(object_to_write))

When I build and run the container I get the following error:

Traceback (most recent call last):
  File "/scripts/s3_upload", line 40, in <module>
    s3_client.upload_fileobj(f, s3_bucket_name, file_name)
  File "/usr/local/lib/python3.7/site-packages/boto3/s3/inject.py", line 539, in upload_fileobj
    return future.result()
  File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 337, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 643, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 662, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials

That's understandable since I don't have the credentials in the container. What do I need to add to the code or the ~/.aws/config file for it to use the IAM role I've set up? Unfortunately the docs aren't very clear in this regard.

Thanks in advance.

Yury Stanev
  • 37
  • 1
  • 10
  • https://stackoverflow.com/questions/44171849/aws-boto3-assumerole-example-which-includes-role-usage see this can help you. – Avinash Dalvi Sep 28 '20 at 16:35
  • If you're running the service on an EC2 instance with an instance profile assigned, container processes can use the EC2 metadata service `http://169.254.169.254/` to get credentials for the instance's IAM role; Boto will handle this automatically. – David Maze Sep 28 '20 at 17:50
  • @DavidMaze as far as I know it's not running on EC2, the only thing I have on EC2 is OpenVPN Access Server. – Yury Stanev Sep 28 '20 at 17:52

1 Answers1

0

Try this:

import boto3

session = boto3.Session(profile_name="crossaccount")
s3 = session.client("s3")
Ihor Shylo
  • 572
  • 5
  • 12