Download a gzipped file, md5 checksum it, and then save extracted data if matches

Question

I'm currently attempting to download two files using Python, one a gzipped file, and the other, its checksum.

I would like to verify that the gzipped file's contents match the md5 checksum, and then I would like to save the contents to a target directory.

I found out how to download the files here, and I learned how to calculate the checksum here. I load the URLs from a JSON config file, and I learned how to parse JSON file values here.

I put it all together into the following script, but I'm stuck attempting to store the verified contents of the gzipped file.

import json
import gzip
import urllib
import hashlib

# Function for creating an md5 checksum of a file
def md5Gzip(fname):
    hash_md5 = hashlib.md5()

    with gzip.open(fname, 'rb') as f:
        # Make an iterable of the file and divide into 4096 byte chunks
        # The iteration ends when we hit an empty byte string (b"")
        for chunk in iter(lambda: f.read(4096), b""):
            # Update the MD5 hash with the chunk
            hash_md5.update(chunk)

    return hash_md5.hexdigest()

# Open the configuration file in the current directory
with open('./config.json') as configFile:
    data = json.load(configFile)

# Open the downloaded checksum file
with open(urllib.urlretrieve(data['checksumUrl'])[0]) as checksumFile:
    md5Checksum = checksumFile.read()

# Open the downloaded db file and get it's md5 checksum via gzip.open
fileMd5 = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])

if (fileMd5 == md5Checksum):
    print 'Downloaded Correct File'
    # save correct file
else:
    print 'Downloaded Incorrect File'
    # do some error handling

In your `md5Gzip`, return a `tuple` instead of just the hash. i.e `return hash_md5.digest(), file_content` — Quan To, Sep 23 '16 at 04:08

score 1 · Accepted Answer · answered Sep 23 '16 at 04:13

In your md5Gzip, return a tuple instead of just the hash.

def md5Gzip(fname):
    hash_md5 = hashlib.md5()
    file_content = None

    with gzip.open(fname, 'rb') as f:
        # Make an iterable of the file and divide into 4096 byte chunks
        # The iteration ends when we hit an empty byte string (b"")
        for chunk in iter(lambda: f.read(4096), b""):
            # Update the MD5 hash with the chunk
            hash_md5.update(chunk)
        # get file content
        f.seek(0)
        file_content = f.read()

    return hash_md5.hexdigest(), file_content

Then, in your code:

fileMd5, file_content = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])

Download a gzipped file, md5 checksum it, and then save extracted data if matches

1 Answers1