56

I use the following code to stream large files from the Internet into a local file:

fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
    fp.write(line)
fp.close()

This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
hoju
  • 28,392
  • 37
  • 134
  • 178

4 Answers4

115

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            break
        f.write(chunk)

Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.

Nick T
  • 25,754
  • 12
  • 83
  • 121
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • thanks Alex - looks like that was my problem because most of the lines were only a few hundred bytes. – hoju Oct 05 '09 at 00:32
  • 8
    russenreaktor, using the with open(...) as ...: has an implicit close() called upon leaving the with statement. – mklauber Aug 29 '11 at 18:33
  • 10
    Using `for chunk in iter(lambda: f.read(CHUNK), ''):` instead of `while True:` is also more pythonic. – Loïc G. Jan 06 '12 at 22:33
  • 1
    @russenreaktor if you use construct: with open (...) as ...: you do not have to manually care about close. – andilabs Aug 02 '13 at 12:12
  • Ok, my edit is totally broken and the fix keeps getting declined by both the community and the moderators. If somebody with more rep or something could fix it would ease my shame ;-) – pnovotnak Sep 07 '15 at 19:55
  • Is it possible do this at an OS/lower network protocol level? – Jean Aug 23 '20 at 15:33
70

You can also use shutil:

import shutil
try:
    from urllib.request import urlopen # Python 3
except ImportError:
    from urllib2 import urlopen # Python 2

def get_large_file(url, file, length=16*1024):
    req = urlopen(url)
    with open(file, 'wb') as fp:
        shutil.copyfileobj(req, fp, length)
Nick T
  • 25,754
  • 12
  • 83
  • 121
Tiago
  • 9,457
  • 5
  • 39
  • 35
  • 1
    +1, this does exactly the same as Alex Martelli suggested. And it accepts the `length` parameter (`shutil.copyfileobj(fsrc, fdst[, length])`) which is also = 16 * 1024 by default – Antony Hatchkins May 13 '11 at 05:07
6

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.

Example:

import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')

Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.

liori
  • 40,917
  • 13
  • 78
  • 105
4

You can use urllib.retrieve() to download files:

Example:

try:
    from urllib import urlretrieve # Python 2

except ImportError:
    from urllib.request import urlretrieve # Python 3

url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")
Rybot666
  • 3
  • 1
  • 3
Aravindh
  • 535
  • 3
  • 9