Python urllib2 Progress Hook

Question

I am trying to create a download progress bar in python using the urllib2 http client. I've looked through the API (and on google) and it seems that urllib2 does not allow you to register progress hooks. However the older deprecated urllib does have this functionality.

Does anyone know how to create a progress bar or reporting hook using urllib2? Or are there some other hacks to get similar functionality?

Kenan Banks · Accepted Answer · 2010-01-08T23:25:36.780

42

Here's a fully working example that builds on Anurag's approach of chunking in a response. My version allows you to set the the chunk size, and attach an arbitrary reporting function:

import urllib2, sys

def chunk_report(bytes_so_far, chunk_size, total_size):
   percent = float(bytes_so_far) / total_size
   percent = round(percent*100, 2)
   sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" % 
       (bytes_so_far, total_size, percent))

   if bytes_so_far >= total_size:
      sys.stdout.write('\n')

def chunk_read(response, chunk_size=8192, report_hook=None):
   total_size = response.info().getheader('Content-Length').strip()
   total_size = int(total_size)
   bytes_so_far = 0

   while 1:
      chunk = response.read(chunk_size)
      bytes_so_far += len(chunk)

      if not chunk:
         break

      if report_hook:
         report_hook(bytes_so_far, chunk_size, total_size)

   return bytes_so_far

if __name__ == '__main__':
   response = urllib2.urlopen('http://www.ebay.com');
   chunk_read(response, report_hook=chunk_report)

edited Jan 08 '10 at 23:25

answered Jan 08 '10 at 19:11

Kenan Banks

207,056
34
155
173

3

Thats great, for downloading. Is there something similar for uploading? (i.e. writing large amounts of post data?) – speedplane Jan 11 '10 at 05:53
1

where exactly does this download your file to? I can't seem to find it. – Zac Brown Dec 02 '10 at 16:00
1

@Zachary As far as I can tell, this isn't 'saving' a file; it's opening a url. To save the file you would do `file = open('myfile.html', 'wb')` then `file.write(response.read())` – styfle May 20 '11 at 04:13
2

Note, a response doesn't always include the "Content-Length" header. This will fail for servers that don't support it. – Cerin Dec 14 '13 at 15:26
1

@styfle: actually, it's opening an URL and *discarding* its bytes. And a `responde.read()` would defeat the whole point of a periodic progress report. – MestreLion Jun 07 '14 at 10:53

Anurag Uniyal · Answer 2 · 2010-01-08T15:55:44.443

12

Why not just read data in chunks and do whatever you want to do in between, e.g. run in a thread, hook into a UI, etc etc

import urllib2

urlfile = urllib2.urlopen("http://www.google.com")

data_list = []
chunk = 4096
while 1:
    data = urlfile.read(chunk)
    if not data:
        print "done."
        break
    data_list.append(data)
    print "Read %s bytes"%len(data)

output:

Read 4096 bytes
Read 3113 bytes
done.

edited Jan 08 '10 at 15:55

answered Jan 08 '10 at 15:48

Anurag Uniyal

85,954
40
175
219

Only thing is, I think the last line should be _print "Read %s bytes"%len(**data_list**)_ – Zac Brown Oct 23 '10 at 20:24
@Zachary Brown , No because I am just printing how much data is being read each time, though better would be to print total data read, but still it wouldn't be len(data_list) – Anurag Uniyal Oct 24 '10 at 03:22

score 5 · Answer 3 · edited Feb 08 '11 at 01:10

5

urlgrabber has built-in support for progress notification.

edited Feb 08 '11 at 01:10

joshk0

2,574
2
25
36

answered Jan 08 '10 at 15:53

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

score 1 · Answer 4 · answered Jan 15 '15 at 19:22

Simplified version:

temp_filename = "/tmp/" + file_url.split('/')[-1]
f = open(temp_filename, 'wb')
remote_file = urllib2.urlopen(file_url)

try:
    total_size = remote_file.info().getheader('Content-Length').strip()
    header = True
except AttributeError:
    header = False # a response doesn't always include the "Content-Length" header

if header:
    total_size = int(total_size)

bytes_so_far = 0

while True:
    buffer = remote_file.read(8192)
    if not buffer:
        sys.stdout.write('\n')
        break

    bytes_so_far += len(buffer)
    f.write(buffer)
    if not header:
        total_size = bytes_so_far # unknown size

    percent = float(bytes_so_far) / total_size
    percent = round(percent*100, 2)
    sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" % (bytes_so_far, total_size, percent))

score 0 · Answer 5 · answered Jun 16 '19 at 14:27

Minor modification to Triptych's response to allow for actually writing out the file (python3):

from urllib.request import urlopen

def chunk_report(bytes_so_far, chunk_size, total_size):
    percent = float(bytes_so_far) / total_size
    percent = round(percent*100, 2)
    sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" %
                     (bytes_so_far, total_size, percent))

    if bytes_so_far >= total_size:
        sys.stdout.write('\n')


def chunk_read(response, chunk_size=8192, report_hook=None):
    total_size = response.info().get("Content-Length").strip()
    total_size = int(total_size)
    bytes_so_far = 0
    data = b""

    while 1:
        chunk = response.read(chunk_size)
        bytes_so_far += len(chunk)

        if not chunk:
            break

        if report_hook:
            report_hook(bytes_so_far, chunk_size, total_size)

        data += chunk

    return data

Usage:

with open(out_path, "wb") as f:
    response = urlopen(filepath)
    data_read = chunk_read(response, report_hook=chunk_report)

    f.write(data_read)

Python urllib2 Progress Hook

5 Answers5

Simplified version:

Linked

Related