To avoid downloading the individual objects onto disk, you'll need to stream the objects for each prefix (remember: S3 uses hierarchies), save the zip locally, upload it to S3, then delete. Here's the code I would use (and tested successfully in AWS):
import boto3
import io
import zipfile
import os
s3 = boto3.client('s3')
def zip_files(bucket_name, prefix):
# List all objects in the bucket with the specified prefix
response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
# Create a BytesIO object to store the compressed data
zip_buffer = io.BytesIO()
for obj in response.get('Contents', []):
s3_object = s3.get_object(Bucket=bucket_name, Key=obj['Key'])
# Use the ZipFile module to write the contents of the S3 object to the zip stream
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
# Write the contents of the S3 object to the zip file
zip_file.writestr(obj['Key'], s3_object['Body'].read())
# Save the zip file to disk
with open(f'{prefix.rstrip("/")}.zip', 'wb') as f:
f.write(zip_buffer.getvalue())
# Upload the compressed data to the S3 bucket and delete
zip_buffer.seek(0)
s3.put_object(Bucket=bucket_name, Key=f'{prefix}{prefix.rstrip("/")}.zip', Body=zip_buffer)
os.remove(f'{prefix.rstrip("/")}.zip')
bucket = 'foobucket'
folders = ['foo/', 'bar/', 'baz/']
for folder in folders:
zip_files(bucket, folder)
You haven't provided any Python code to show that you're encountering the same memory limit as described in the Lambda Node you linked, so I'm assuming this is not a huge concern. Either way, the os.remove
should keep things lightweight as the process continues.
Also: if you're running this logic within a Lambda function, you'll have to tweak it to fit the formatting required by Lambda.
Obviously, add logging and error handling to your needs.
Hope this helps!