0

I am trying to upload large files from ec2 instance to aws s3. I am using python multi-part load to upload files to s3. It is successfully executing when i am getting files around 40gb but when i am getting files above 70gb, the code is failing after loading around 20%. I am attaching my code and error message below:

import threading
import boto3
import os
import sys
from boto3.s3.transfer import TransferConfig

s3 = boto3.resource('s3')

local_fs_path=sys.argv[1]
subject_area=sys.argv[2]
py_file=sys.argv[3]
s3_path=sys.argv[4]

BUCKET_NAME = "***********"
def multi_part_upload_with_s3():
    # Multipart upload
    config = TransferConfig(multipart_threshold=1024 * 5, max_concurrency=10,
                            multipart_chunksize=1024 * 5, use_threads=True)
    file_path = os.path.dirname(local_fs_path) + '/' + py_file
    key_path = s3_path + '/' + subject_area + '/' +  py_file
    s3.meta.client.upload_file(file_path, BUCKET_NAME, key_path,
                            ExtraArgs={'ACL': 'public-read'},
                            Config=config,
                            Callback=ProgressPercentage(file_path)
                            )
class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()
    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            sys.stdout.write(
                "\r%s  %s / %s  (%.2f%%)" % (
                    self._filename, self._seen_so_far, self._size,
                    percentage))
            sys.stdout.flush()

if __name__ == '__main__':
   multi_part_upload_with_s3()

The error message i am getting is like:

  /home/*****/etl/******/*******/*******/*********/***.20190510.dat  13080289280 / 88378295325.0  (14.80%)Traceback (most recent call last):
  File "/home/*****/etl/******/*******/*******/*********/multipart_load.py", line 46, in <module>
    multi_part_upload_with_s3()
  File "/home/*****/etl/******/*******/*******/*********/multipart_load.py", line 25, in multi_part_upload_with_s3
    Callback=ProgressPercentage(file_path)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 131, in upload_file
    extra_args=ExtraArgs, callback=Callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 287, in upload_file
    filename, '/'.join([bucket, key]), e))
boto3.exceptions.S3UploadFailedError: Failed to upload /home/*****/etl/******/*******/*******/*********/********.20190510.dat to aws_bucket_name/******/*****/temp_dir/*******.20190510.dat: An error occurred (RequestTimeout) when calling the UploadPart operation (reached max retries: 4): Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
Basant Jain
  • 115
  • 1
  • 12
  • The error message is indicating a problem in the callback. This might help: [Track download progress of S3 file using boto3 and callbacks](https://stackoverflow.com/q/41827963/174777) – John Rotenstein May 13 '19 at 07:13
  • Hi John, I don't think it is a callback problem. It's showing Connection error. But even then i will remove the callback and try to run the code again because that callback is not important to me. – Basant Jain May 13 '19 at 08:13
  • Even after removing callback, the job is failing with the same message. Can you help me with this? – Basant Jain May 15 '19 at 09:53
  • The error message suggests network drop-outs. You could try turning on [boto3 logging](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/boto3.html#boto3.set_stream_logger) with `boto3.set_stream_logger('')` to obtain more detail. Is it a requirement to use your own code for this? It would be handy to use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/) instead. – John Rotenstein May 15 '19 at 11:56

0 Answers0