2

I have multiple URLs that returns zip files. Most of the files, I'm able to download using urllib2 library as follows:

request = urllib2.urlopen(url)
zip_file = request.read()

The problem I'm having is that one of the files is 35Mb in size (zipped) and I'm never able to finish downloading it using this library. I'm able to download it using wget and the browser normally.

I have tried downloading the file in chuncks like this:

request = urllib2.urlopen(url)
buffers = []
while True:
    buffer = request.read(8192)
    if buffer:
        buffers.append(buffer)
    else:
        break
final_file = ''.join(buffers)

But this also does not finish the download. No error is raised, so it's hard to debug what is happening. Unfortunately, I can't post an example of the url / file here.

Any suggestions / advices?

duduklein
  • 10,014
  • 11
  • 44
  • 55
  • It's pretty hard to debug without further information or a URL which reproduces it. However, why not just use `final_file = request.read()`? Your code above is building an array of strings which will store all the data in memory, so I don't see any reason to complicate the code to read chunk at a time. – Ben Hoyt Apr 24 '12 at 20:19
  • possible duplicate of [How do I download a zip file in python using urllib2?](http://stackoverflow.com/questions/4028697/how-do-i-download-a-zip-file-in-python-using-urllib2) – Léo Léopold Hertz 준영 Apr 24 '12 at 20:21
  • @benhoyt this was my first attempt, but it did not work. That's why I tried to divide the file in chuncks – duduklein Apr 24 '12 at 20:29
  • possible duplicate of [Stream large binary files with urllib2 to file](http://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file) – ChristopheD Apr 24 '12 at 20:43
  • not the same as Masi suggested, but maybe the same as @ChristopheD. At least, the problem and the proposed solutions seem to be pretty close – duduklein Apr 25 '12 at 11:19

1 Answers1

2

This is copy / paste from my application which downloads it's own update installer. It reads the file in blocks and immediately saves the blocks in output file on the disk.

def DownloadThreadFunc(self):
    try:
        url = self.lines[1]
        data = None
        req = urllib2.Request(url, data, {})
        handle = urllib2.urlopen(req)

        self.size = int(handle.info()["Content-Length"])
        self.actualSize = 0
        name = path.join(DIR_UPDATES, url.split("/")[-1])
        blocksize = 64*1024

        fo = open(name, "wb")
        while not self.terminate:
            block = handle.read(blocksize)
            self.actualSize += len(block)
            if len(block) == 0:
                break
            fo.write(block)
        fo.close()
    except (urllib2.URLError, socket.timeout), e:
        try:
            fo.close()
        except:
            pass
        error("Download failed.", unicode(e))  

I use self.size and self.actualSize to show the download progress in GUI thread and self.terminate to cancel the download from the GUI button if needed.

Fenikso
  • 9,251
  • 5
  • 44
  • 72
  • This worked perfectly! Thanks a lot. But can you tell me why my simplified version did not work? – duduklein Apr 24 '12 at 21:48
  • No idea. But I kind of suspect this construction: `if buffer:`. I like to store the block to the disk directly so the progress can be seen on the output file also. – Fenikso Apr 25 '12 at 07:51
  • I see your point. I tried your version of the code also storing the file only in memory and it worked as well. Could it be my initial buffer size (too small)? – duduklein Apr 25 '12 at 11:16
  • I think it may be the condition. When does your code get stuck? Is the file already downloaded in the memory? Add some debug printouts about how much you have already downloaded, what is the size of the data etc. – Fenikso Apr 25 '12 at 12:00
  • I'm using not "if not block" and it's working fine. It seems it's the blocksize that was too small. That's the only difference. Thanks anyway! – duduklein Apr 26 '12 at 13:01
  • You can also use the `flush` method to write to the disk. – cwahls Mar 28 '16 at 01:08