0

I am having a use case where very large .gz files are generated by a snowflake and stored in S3.

Now I want to convert each file separately into zip files using Python, AWS Lamda, and boto3.

Getting exception with below code. can someone help me with it

    def convert_gz_s3_files(prefix):
    s3 = boto3.session.Session().client('s3')
    gzFiles = getGZFiles(s3, prefix)
    try:
        for gzipFile in gzFiles:
            obj = s3.get_object(Bucket=bucket, Key=gzipFile[0])
            with gzip.GzipFile(fileobj=obj['Body']) as gz:
                data = gz.read().decode()
                archive = zipfile.ZipFile(data, mode="r")

                # with zipfile.ZipFile(data, mode='w', allowZip64 = True) as zip:
                #     zipData = zip.read().decode()
                    # zipObject = s3.Object(bucket, prefix+'file_name.csv.zip')
                    # zipRes = zipObject.put(body=zipData)
                    # print('zip result => {}', zipRes)

            print(gzipFile)
    except Exception as e:
        logger.error({'error': str(e), 'message': 'failed gzip decoding'})
        raise e

    print(gzFiles)
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Rajeev
  • 4,762
  • 8
  • 41
  • 63
  • 1
    _What_ exception, specifically? Include it in the question. – Charles Duffy Jul 25 '21 at 12:45
  • Mind, the first argument to ZipFile needs to be a file-like object or a string that works as a path, and it doesn't look like you're doing either of those things here. In general, you want to _create_ your ZipFile with an empty backing file or file-like object (which to choose depending on how large the largest file in s3 is and how much memory you have available to use), populate it, and then dump that backing file into an s3 bucket. – Charles Duffy Jul 25 '21 at 12:46
  • [python in-memory zip library](https://stackoverflow.com/questions/2463770/python-in-memory-zip-library) describes the piece you're presumably missing based on a code read, though without an actual exception or [mre] included, we can't be 100% sure it's the same problem you're asking about. – Charles Duffy Jul 25 '21 at 12:48
  • 2
    That said, a zip file is a bad choice to replace a .gz file, because they're completely different file formats. A zip file is an _archive_ format -- it bundles multiple files together into one. A gz file just compresses _one single_ file, and metadata about that file is optional (because it's assumed that file naming/permissions/etc will all be set on the gz file itself). So when you move content from a .gz archive to a .zip archive, a lot of the information the zip file needs to have (filename! permissions!) isn't necessarily there in the .gz file at all. – Charles Duffy Jul 25 '21 at 12:50
  • Taking a closer look at the question linked above, all the essential elements of my answer here are already given in another answer there; so I'm deleting my answer and closing the question as duplicate. – Charles Duffy Jul 25 '21 at 13:10
  • 1
    With a wee bit of attention paid to the [gzip](https://datatracker.ietf.org/doc/html/rfc1952) and [zip](https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.9.TXT) formats, you can convert a gzip file to a single-entry zip file _without_ decompressing and recompressing. Both formats use the deflate compressed format. So you just need to strip the header and trailer from the gzip file, and repackage the compressed data into a zip file. See https://stackoverflow.com/a/52436243/1180620 . – Mark Adler Jul 25 '21 at 19:10

0 Answers0