I've made a python script that uploads the files into s3 bucket. I need the script to run periodically from within the docker container.
#!/usr/local/bin/python3
import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time
s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'
uploaded = None
max_mod_time = '0'
file_list = glob.glob('/data/*.json')
file_mod_time = None
# get mod time for all file in data directory
file_info = [{'file': file, 'mod_time': time.strftime(
'%Y-%m-%d %H:%M:%S', time.gmtime(os.path.getmtime(file)))} for file in file_list]
timestamp_sorted_file_info = sorted(file_info, key = lambda f: f['mod_time'])
if os.path.exists('max_mod_time.json'):
with open('max_mod_time.json', 'r') as mtime:
max_mod_time = json.load(mtime)['max_mod_time']
# TODO: fix strange behavior in Docker Container
# upload the files tp s3
for file in timestamp_sorted_file_info:
file_mod_time = file['mod_time']
# file_mod_time = '2020-09-19 13:28:53' # for debugging
file_name = os.path.basename(file['file'])
uploaded = False
if file_mod_time > max_mod_time:
with open(os.path.join('/data/', file_name), "rb") as f:
s3_client.upload_fileobj(f, s3_bucket_name, file_name)
# error check - https://stackoverflow.com/a/38376288/7582937
try:
s3_client.head_object(Bucket=s3_bucket_name, Key=file_name)
except ClientError as error:
# Not found
if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
raise error
uploaded = True
# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
object_to_write = json.dumps(
{"max_mod_time": file_mod_time})
if uploaded:
if object_to_write:
open('max_mod_time.json', 'w').write(str(object_to_write))
I'm using crond
within 3.7-alpine
python container. My Dockerfile
is bellow:
FROM python:3.7-alpine
WORKDIR /scripts
RUN pip install boto3
ENV AWS_ACCESS_KEY_ID=############
ENV AWS_SECRET_ACCESS_KEY=###################
COPY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root
COPY s3_upload.py /scripts/s3_upload.py
RUN chmod a+x /scripts/s3_upload.py
ENTRYPOINT crond -f
The script is suppose to run periodically and upload any new files into the s3 bucket, bellow is my crontab file.
5-10/1 * * * * /bin/pwd; /scripts/s3_upload
I'm using docker-compose.yml
to build bring up the container and sync host directory to directory in container.
version: '3.8'
services:
s3-data-transfer:
image: ap-aws-s3-file-upload
build:
context: ./s3-data-upload/
volumes:
- ./data/features:/data
After running docker-compose build
and docker-compose up
all I get as far as output is this:
Creating highspeed_s3-data-transfer_1 ... done
Attaching to highspeed_s3-data-transfer_1
It just hangs there, I've manually tested the script by attaching to the container, creating files and running the upload script. It works as it should when ran manually.
There seems to be something wrong with crond
config/setup, I don't see anything that could cause issues.
How can I fix this? Any suggestions are welcome.
Thank You.