Create a zip file on S3 from files on S3 in Python

Question

I'm trying to compress the all files keeping the same directory structure that are in the directory on the S3 bucket and put that zip on the S3 bucket.

Unpacking a zip file from the S3 bucket to S3 bucket is quite easy with BytesIO and zipfile, but I'm not sure how to do this with a directory containing a hundred files.

I found this link helpful but the post is for Lambda Node. Create a zip file on S3 from files on S3 using Lambda Node

ford-at-aws · Answer 1 · 2023-03-01T23:27:16.800

To avoid downloading the individual objects onto disk, you'll need to stream the objects for each prefix (remember: S3 uses hierarchies), save the zip locally, upload it to S3, then delete. Here's the code I would use (and tested successfully in AWS):

import boto3
import io
import zipfile
import os

s3 = boto3.client('s3')

def zip_files(bucket_name, prefix):
    # List all objects in the bucket with the specified prefix
    response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)


    # Create a BytesIO object to store the compressed data
    zip_buffer = io.BytesIO()

    for obj in response.get('Contents', []):
        s3_object = s3.get_object(Bucket=bucket_name, Key=obj['Key'])

        # Use the ZipFile module to write the contents of the S3 object to the zip stream
        with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
            # Write the contents of the S3 object to the zip file
            zip_file.writestr(obj['Key'], s3_object['Body'].read())

        # Save the zip file to disk
        with open(f'{prefix.rstrip("/")}.zip', 'wb') as f:
            f.write(zip_buffer.getvalue())

    # Upload the compressed data to the S3 bucket and delete
    zip_buffer.seek(0)
    s3.put_object(Bucket=bucket_name, Key=f'{prefix}{prefix.rstrip("/")}.zip', Body=zip_buffer)
    os.remove(f'{prefix.rstrip("/")}.zip')

bucket = 'foobucket'
folders = ['foo/', 'bar/', 'baz/']
for folder in folders:
    zip_files(bucket, folder)

You haven't provided any Python code to show that you're encountering the same memory limit as described in the Lambda Node you linked, so I'm assuming this is not a huge concern. Either way, the os.remove should keep things lightweight as the process continues.

Also: if you're running this logic within a Lambda function, you'll have to tweak it to fit the formatting required by Lambda.

Obviously, add logging and error handling to your needs.

Hope this helps!

Create a zip file on S3 from files on S3 in Python

1 Answers1