Using an AWS Lambda function, I download an S3 zipped file and unzip it.
For now I do it using extractall
. Upon unzipping, all files are saved in the tmp/
folder.
s3.download_file('test','10000838.zip','/tmp/10000838.zip')
with zipfile.ZipFile('/tmp/10000838.zip', 'r') as zip_ref:
lstNEW = list(filter(lambda x: not x.startswith("__MACOSX/"), zip_ref.namelist()))
zip_ref.extractall('/tmp/', members=lstNEW)
After unzipping, I want to gzip files and place them in another S3 bucket.
Now, how can I read all files from the tmp
folder again and gzip each file?
$item.csv.gz
I see this (https://docs.python.org/3/library/gzip.html) but I am not sure which function is to be used.
If it's the compress function, how exactly do I use it? I read in this answer gzip a file in Python that I can use the open function gzip.open('', 'wb')
to gzip a file but I couldn't figure out how to use it in my case. In the open function, do I specify the target location or the source location? Where do I save the gzipped files such as that I can later save them to S3?
Alternative Option:
Instead of loading everything into the tmp
folder, I read that I can also open an output stream, wrap the output stream in a gzip wrapper, and then copy from one stream to the other
with zipfile.ZipFile('/tmp/10000838.zip', 'r') as zip_ref:
testList = []
for i in zip_ref.namelist():
if (i.startswith("__MACOSX/") == False):
testList.append(i)
for i in testList:
zip_ref.open(i, ‘r’)
but then again I am not sure how to continue in the for loop and open the stream and convert files there