0

I am trying to download some files from Amazon S3 using python and extract it to the local disk.

Here is the code I am using to download the and extract the files.

def download(self):
    filename = BASE_OUTPUT_DIR + "/" + str(self.jobId) + "_" + string.replace(self.jobName, " ", "_") + "_" + str(self.count)
    try:
        response = requests.get(self.fileUrl.rstrip(), stream=True, verify=False)
        if response.status_code == 200:
            with open(filename + ".gz", 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
            del response
            subprocess.call(["gunzip", "--name", filename])
            return filename
    except Exception as e:
        self.logger.debug("An exception occurred while downloading file (%s)" % (filename))
        self.logger.debug("Trying to delete the file")
        try:
            os.system('rm -rf %s' % filename)
        except Exception as e:
            self.logger.debug(e)
    return None

Sometimes I got this error gzip: /tmp/files/file_8340.gz: unexpected end of file.

I would suspect the files but this issue happens randomly with random files. I guise the download did not complete but I am not sure.

Fanooos
  • 2,718
  • 5
  • 31
  • 55
  • You're not using the `stream=True` for anything, so you may as well remove that, or call `response.iter_content()` to download in chunks. Also, why not just unzip the contents in Python? `import gzip; gzip.decompress(response.raw)`. See more examples in this comment: https://stackoverflow.com/a/13137873/219640 – Erik Cederstrand Jan 18 '18 at 08:43
  • @ErikCederstrand, When I removed `stream=True` the file size became 0 byte and the files are not downloaded at all. – Fanooos Jan 18 '18 at 10:09

0 Answers0