Stream large binary files with urllib2 to file

Question

I use the following code to stream large files from the Internet into a local file:

fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
    fp.write(line)
fp.close()

This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)

If only this was built in as a single command, e.g. `urllib.urldownload(url, file)` — gak, Oct 11 '12 at 02:53
@GeraldKaszuba: you mean like [`urllib.urlretrieve(url, file)`](http://docs.python.org/2/library/urllib.html#urllib.urlretrieve) — jfs, Mar 10 '14 at 18:10

score 115 · Accepted Answer · edited Oct 27 '16 at 20:12

115

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            break
        f.write(chunk)

Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.

edited Oct 27 '16 at 20:12

Nick T

25,754
12
83
121

answered Oct 04 '09 at 23:42

Alex Martelli

854,459
170
1,222
1,395

thanks Alex - looks like that was my problem because most of the lines were only a few hundred bytes. – hoju Oct 05 '09 at 00:32
8

russenreaktor, using the with open(...) as ...: has an implicit close() called upon leaving the with statement. – mklauber Aug 29 '11 at 18:33
10

Using `for chunk in iter(lambda: f.read(CHUNK), ''):` instead of `while True:` is also more pythonic. – Loïc G. Jan 06 '12 at 22:33
1

@russenreaktor if you use construct: with open (...) as ...: you do not have to manually care about close. – andilabs Aug 02 '13 at 12:12
Ok, my edit is totally broken and the fix keeps getting declined by both the community and the moderators. If somebody with more rep or something could fix it would ease my shame ;-) – pnovotnak Sep 07 '15 at 19:55
Is it possible do this at an OS/lower network protocol level? – Jean Aug 23 '20 at 15:33

score 70 · Answer 2 · edited Oct 27 '16 at 20:07

70

You can also use shutil:

import shutil
try:
    from urllib.request import urlopen # Python 3
except ImportError:
    from urllib2 import urlopen # Python 2

def get_large_file(url, file, length=16*1024):
    req = urlopen(url)
    with open(file, 'wb') as fp:
        shutil.copyfileobj(req, fp, length)

edited Oct 27 '16 at 20:07

Nick T

25,754
12
83
121

answered Mar 22 '11 at 20:28

Tiago

9,457
5
39
35

1

+1, this does exactly the same as Alex Martelli suggested. And it accepts the `length` parameter (`shutil.copyfileobj(fsrc, fdst[, length])`) which is also = 16 * 1024 by default – Antony Hatchkins May 13 '11 at 05:07

score 6 · Answer 3 · answered Oct 04 '09 at 23:07

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.

Example:

import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')

Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.

it does roughly the same as Alex Martinelly suggested; `BLOCK_SIZE`=8*1024 and is normally fixed — Antony Hatchkins, May 13 '11 at 05:14

score 4 · Answer 4 · edited Nov 07 '16 at 19:14

4

You can use urllib.retrieve() to download files:

Example:

try:
    from urllib import urlretrieve # Python 2

except ImportError:
    from urllib.request import urlretrieve # Python 3

url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")

edited Nov 07 '16 at 19:14

Rybot666

3
1
3

answered Aug 27 '14 at 16:34

Aravindh

535
3
9

Stream large binary files with urllib2 to file

4 Answers4

Linked

Related