29

I am trying to create a download progress bar in python using the urllib2 http client. I've looked through the API (and on google) and it seems that urllib2 does not allow you to register progress hooks. However the older deprecated urllib does have this functionality.

Does anyone know how to create a progress bar or reporting hook using urllib2? Or are there some other hacks to get similar functionality?

speedplane
  • 15,673
  • 16
  • 86
  • 138

5 Answers5

42

Here's a fully working example that builds on Anurag's approach of chunking in a response. My version allows you to set the the chunk size, and attach an arbitrary reporting function:

import urllib2, sys

def chunk_report(bytes_so_far, chunk_size, total_size):
   percent = float(bytes_so_far) / total_size
   percent = round(percent*100, 2)
   sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" % 
       (bytes_so_far, total_size, percent))

   if bytes_so_far >= total_size:
      sys.stdout.write('\n')

def chunk_read(response, chunk_size=8192, report_hook=None):
   total_size = response.info().getheader('Content-Length').strip()
   total_size = int(total_size)
   bytes_so_far = 0

   while 1:
      chunk = response.read(chunk_size)
      bytes_so_far += len(chunk)

      if not chunk:
         break

      if report_hook:
         report_hook(bytes_so_far, chunk_size, total_size)

   return bytes_so_far

if __name__ == '__main__':
   response = urllib2.urlopen('http://www.ebay.com');
   chunk_read(response, report_hook=chunk_report)
Kenan Banks
  • 207,056
  • 34
  • 155
  • 173
  • 3
    Thats great, for downloading. Is there something similar for uploading? (i.e. writing large amounts of post data?) – speedplane Jan 11 '10 at 05:53
  • 1
    where exactly does this download your file to? I can't seem to find it. – Zac Brown Dec 02 '10 at 16:00
  • 1
    @Zachary As far as I can tell, this isn't 'saving' a file; it's opening a url. To save the file you would do `file = open('myfile.html', 'wb')` then `file.write(response.read())` – styfle May 20 '11 at 04:13
  • 2
    Note, a response doesn't always include the "Content-Length" header. This will fail for servers that don't support it. – Cerin Dec 14 '13 at 15:26
  • 1
    @styfle: actually, it's opening an URL and *discarding* its bytes. And a `responde.read()` would defeat the whole point of a periodic progress report. – MestreLion Jun 07 '14 at 10:53
12

Why not just read data in chunks and do whatever you want to do in between, e.g. run in a thread, hook into a UI, etc etc

import urllib2

urlfile = urllib2.urlopen("http://www.google.com")

data_list = []
chunk = 4096
while 1:
    data = urlfile.read(chunk)
    if not data:
        print "done."
        break
    data_list.append(data)
    print "Read %s bytes"%len(data)

output:

Read 4096 bytes
Read 3113 bytes
done.
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
  • Only thing is, I think the last line should be _print "Read %s bytes"%len(**data_list**)_ – Zac Brown Oct 23 '10 at 20:24
  • @Zachary Brown , No because I am just printing how much data is being read each time, though better would be to print total data read, but still it wouldn't be len(data_list) – Anurag Uniyal Oct 24 '10 at 03:22
5

urlgrabber has built-in support for progress notification.

joshk0
  • 2,574
  • 2
  • 25
  • 36
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
1

Simplified version:

temp_filename = "/tmp/" + file_url.split('/')[-1]
f = open(temp_filename, 'wb')
remote_file = urllib2.urlopen(file_url)

try:
    total_size = remote_file.info().getheader('Content-Length').strip()
    header = True
except AttributeError:
    header = False # a response doesn't always include the "Content-Length" header

if header:
    total_size = int(total_size)

bytes_so_far = 0

while True:
    buffer = remote_file.read(8192)
    if not buffer:
        sys.stdout.write('\n')
        break

    bytes_so_far += len(buffer)
    f.write(buffer)
    if not header:
        total_size = bytes_so_far # unknown size

    percent = float(bytes_so_far) / total_size
    percent = round(percent*100, 2)
    sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" % (bytes_so_far, total_size, percent))
0

Minor modification to Triptych's response to allow for actually writing out the file (python3):

from urllib.request import urlopen

def chunk_report(bytes_so_far, chunk_size, total_size):
    percent = float(bytes_so_far) / total_size
    percent = round(percent*100, 2)
    sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" %
                     (bytes_so_far, total_size, percent))

    if bytes_so_far >= total_size:
        sys.stdout.write('\n')


def chunk_read(response, chunk_size=8192, report_hook=None):
    total_size = response.info().get("Content-Length").strip()
    total_size = int(total_size)
    bytes_so_far = 0
    data = b""

    while 1:
        chunk = response.read(chunk_size)
        bytes_so_far += len(chunk)

        if not chunk:
            break

        if report_hook:
            report_hook(bytes_so_far, chunk_size, total_size)

        data += chunk

    return data

Usage:

with open(out_path, "wb") as f:
    response = urlopen(filepath)
    data_read = chunk_read(response, report_hook=chunk_report)

    f.write(data_read)
Geoff
  • 21
  • 3