I am trying to create a serverless processor for my chron job, In this job I receive a zipped file in my S3 bucket from one of my clients, file is around 50MB
in size but once you unzip it, it becomes 1.5GB
in size and there's a hard limit to the space available on AWS Lambda which is 500MB
due to which I cannot download this file from S3 bucket and unzip it on my Lambda, I was successfully able to unzip my file and stream the content line by line from S3 using funzip
in unix script.
for x in $files ; do echo -n "$x: " ; timeout 5 aws s3 cp $monkeydir/$x - | funzip
My Bucket Name:MonkeyBusiness
Key:/Daily/Business/Banana/{current-date}
Object:banana.zip
but now since I am trying to achieve same output using boto3, how I can stream the zipped content to sys i/o and unzip the stream save the content in separate files divided by 10000 lines each and upload the chunked files back to S3. Need guidance as I am pretty new to AWS and boto3.
Please let me know if you need more details about the job.
Below given suggested solution is not applicable here because zlib documentation clearly states that said lib is compatible for gzip file format and my question is for zip file format.
import zlib
def stream_gzip_decompress(stream):
dec = zlib.decompressobj(32 + zlib.MAX_WBITS) # offset 32 to skip the header
for chunk in stream:
rv = dec.decompress(chunk)
if rv:
yield rv