0

Around 60 CSV files being generated daily in my S3 bucket. The average size of each file is around 500MB. I want to zip all these files through lambda function on the fly(without downloading a file inside Lambda execution) and upload these zipped files to another s3 bucket. I came across these solutions 1 and 2 but I am still getting issue in the implementation. Right now, I am trying to stream CSV file data into a zipped file(this zip file is being created in Lambda tmp directory) and then uploading on s3. But I am getting this error message while writing into zip file: [Errno 36] File name too long

This is my test Lambda function where I am just trying with one file but in actual case I need to zip 50-60 CSV files individually:

import boto3
import zipfile


def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    iterator = s3.Object('bucket-name', 'file-name').get()['Body'].iter_lines()
    my_zip = zipfile.ZipFile('/tmp/test.zip', 'w')
    for line in iterator:
        my_zip.write(line)
    
    s3_resource.meta.client.upload_fileobj(file-name, "another-bucket-name", "object-name") 

Also, is there a way where I can stream data from my CSV file, zip it and upload it to another s3 bucket without actually saving a full zip file on Lambda memory?

Raman Balyan
  • 983
  • 2
  • 16
  • 32
  • 1
    What kind of data is it? If they are CSVs or other data formats Athena can read I've used a scheduled Athena-Query to zip files, but solution is a little weird ;-) – Maurice Dec 28 '20 at 10:52
  • @Maurice Data is in CSV format..What was your solution? – Raman Balyan Dec 28 '20 at 10:54

1 Answers1

1

After lot of research and trials, I am able to make it work. I used smart_open library for my issue and managed to zip 550MB file with just 150MB memory usage in my Lambda. To use external library, I had to use Layers in Lambda. Here is my code:

from smart_open import open, register_compressor
import lzma, os


def lambda_handler(event, context):
    with open('s3://bucket-name-where-large-file/file-key-name') as fin:
        with open('s3://bucket-name-to-put-zip-file/zip-file-key-name', 'w') as fout:
            for line in fin:
                fout.write(line)

Please note, smart_open supports .gz and .bz2 file compression. If you want to zip file in other formats, you can create your own compressor using register_compressor method of this library.

Raman Balyan
  • 983
  • 2
  • 16
  • 32