0

I've made a python script that uploads the files into s3 bucket. I need the script to run periodically from within the docker container.

#!/usr/local/bin/python3

import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time

s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'

uploaded = None
max_mod_time = '0'
file_list = glob.glob('/data/*.json')
file_mod_time = None

# get mod time for all file in data directory
file_info = [{'file': file, 'mod_time': time.strftime(
    '%Y-%m-%d %H:%M:%S', time.gmtime(os.path.getmtime(file)))} for file in file_list]

timestamp_sorted_file_info = sorted(file_info, key = lambda f: f['mod_time'])

if os.path.exists('max_mod_time.json'):
    with open('max_mod_time.json', 'r') as mtime:
        max_mod_time = json.load(mtime)['max_mod_time']

# TODO: fix strange behavior in Docker Container
# upload the files tp s3
for file in timestamp_sorted_file_info:
    file_mod_time = file['mod_time']
    # file_mod_time = '2020-09-19 13:28:53' # for debugging
    file_name = os.path.basename(file['file'])
    uploaded = False

    if file_mod_time > max_mod_time:
        with open(os.path.join('/data/', file_name), "rb") as f:
            s3_client.upload_fileobj(f, s3_bucket_name, file_name)

            # error check - https://stackoverflow.com/a/38376288/7582937
            try:
                s3_client.head_object(Bucket=s3_bucket_name, Key=file_name)
            except ClientError as error:
                # Not found
                if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
                    raise error

    uploaded = True

# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
object_to_write = json.dumps(
    {"max_mod_time": file_mod_time})

if uploaded:
    if object_to_write:
        open('max_mod_time.json', 'w').write(str(object_to_write))

I'm using crond within 3.7-alpine python container. My Dockerfile is bellow:

FROM python:3.7-alpine

WORKDIR /scripts

RUN pip install boto3

ENV AWS_ACCESS_KEY_ID=############
ENV AWS_SECRET_ACCESS_KEY=###################

COPY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root

COPY s3_upload.py /scripts/s3_upload.py
RUN chmod a+x /scripts/s3_upload.py

ENTRYPOINT crond -f

The script is suppose to run periodically and upload any new files into the s3 bucket, bellow is my crontab file.

5-10/1 * * * * /bin/pwd; /scripts/s3_upload

I'm using docker-compose.yml to build bring up the container and sync host directory to directory in container.

version: '3.8'
services:
  s3-data-transfer:
    image: ap-aws-s3-file-upload 
    build:
      context: ./s3-data-upload/
    volumes:
      - ./data/features:/data

After running docker-compose build and docker-compose up all I get as far as output is this:

Creating highspeed_s3-data-transfer_1 ... done
Attaching to highspeed_s3-data-transfer_1

It just hangs there, I've manually tested the script by attaching to the container, creating files and running the upload script. It works as it should when ran manually.

There seems to be something wrong with crond config/setup, I don't see anything that could cause issues.

How can I fix this? Any suggestions are welcome.

Thank You.

Yury Stanev
  • 37
  • 1
  • 10

1 Answers1

0

After a while I was able to fix the problem by properly setting my timing in crobtab to:

4/10 * * * * /bin/pwd; /scripts/s3_upload
Yury Stanev
  • 37
  • 1
  • 10