1

i am trying to upload files to S3 before that i am trying to Gzip files, if you see the code below, the files uploaded to the S3 have no change in the size, so i am trying to figure out if i have missed something.

import gzip
import shutil
from io import BytesIO


def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):
    """Compress and upload the contents from fp to S3.

    If compressed_fp is None, the compression is performed in memory.
    """
    if not compressed_fp:
        compressed_fp = BytesIO()
    with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
        shutil.copyfileobj(fp, gz)
    compressed_fp.seek(0)
    bucket.upload_fileobj(
        compressed_fp,
        key,
        {'ContentType': content_type, 'ContentEncoding': 'gzip'})

Courtesy Link for the source

And this is how i am using this fucntion, so basically reading files as stream from SFTP and then trying to Gzip them and then write them to S3.

with pysftp.Connection(host_name, username=user, password=password, cnopts=cnopts, port=int(port)) as sftp:
    list_of_files = sftp.listdir('{}{}'.format(base_path, file_path))
    is_file_found = False
    for file_name in list_of_files:
        if entity_name in str(file_name.lower()):
            is_file_found = True
            flo = BytesIO()
            # Step 1: Read File Using SFTP as input Stream
            sftp.getfo('{}{}/{}'.format(base_path, file_path, file_name), flo)
            s3_destination_key = '{}/{}'.format(s3_path, file_name)
            # Step 2: Write files to desitination S3
            logger.info('Moving file to S3 {} '.format(s3_destination_key))
            # Creating a bucket resource to use bucket object for file upload
            input_bucket_object = S3.Bucket(environment_config['S3_INBOX_BUCKET'])
            flo.seek(0)
            upload_gzipped(input_bucket_object, s3_destination_key, flo)
noobie-php
  • 6,817
  • 15
  • 54
  • 101
  • I tested the gist and was able to upload a correctly gzipped file to S3. However, your code is incomplete so I cannot say if it works or not. I may be able to help more if you can provide a complete test case that reproduces your problem. – Doug Richardson Aug 24 '19 at 04:27

1 Answers1

0

It seems like the upload_gzipped function uses shutil.copyfileobj incorrectly.

Looking at https://docs.python.org/3/library/shutil.html#shutil.copyfileobj shows that you put the source first, and destination second.

Also, you're just writing your object to a gzipped object without ever actually compressing it.

You need to compress fp into a Gzip object, then upload that specific object to S3.

I'd recommend not using that gist from github as it seems wrong.

  • Hi Baptiste. I tested the source code from the gist and it actually does work. The `shutil.copyfileobj` call has `gz` as the destination parameter because that's how you compress using gzip.GzipFile - you copy from the uncompressed file object to the `gz` file object. I'm not sure what the OP's problem is, since their code is incomplete (e.g., it references variables that are not defined, like `entity_name`). – Doug Richardson Aug 24 '19 at 04:25
  • @DougRichardson: the code itself is fine, other then place holders, the question runs primarily around compression of byte stream, or you can say in memory compression. – noobie-php Aug 26 '19 at 10:24
  • @noobie-php in my tests the gzip compression worked. I tested with text files (which compress well). What kind of files are you testing with? – Doug Richardson Aug 26 '19 at 14:20