How to Download Files using Python?

Question

HI, everyone. I am new to Python and am using Python 2.5 on CentOS.

I need to download files like WGET do.

I have done some search, and there are some solutions, an obvious way is this:

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
output = open('test.mp3','wb')
output.write(mp3file.read())
output.close()

This works fine. But I want to know, if the mp3 file is VERY large, like 1Gb, 2Gb or even bigger. Can this code snippet still work? Are there better ways to download large files in Python, maybe with a progress bar like WGET do.

Thanks a lot!

I assume your question is about iteratively reading and writing a chunk at a time, as opposed to reading the entire file into memory at once only to write it all out to the disk afterwards. — chrisaycock, Dec 09 '10 at 21:31
possible duplicate of [Stream large binary files with urllib2 to file](http://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file) — Katriel, Dec 09 '10 at 21:31

score 16 · Accepted Answer · answered Dec 09 '10 at 21:35

16

There's an easier way:

import urllib
urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "/home/download/mp3.mp3")

answered Dec 09 '10 at 21:35

Paul Schreiber

12,531
4
41
63

score 3 · Answer 2 · answered Dec 09 '10 at 21:33

For really big files, your code would use a lot of memory, since you load the whole file into the memory at once. It might be better to read and write the data in chunks:

from __future__ import with_statement
import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
    while True:
        buf = mp3file.read(65536)
        if not buf:
            break
        output.write(buf)

score 2 · Answer 3 · answered Dec 09 '10 at 21:29

2

Why not just call wget then?

import os
os.system ("wget http://www.example.com/songs/mp3.mp3")

answered Dec 09 '10 at 21:29

chrisaycock

36,470
14
88
125

Thanks for this. Is this method safe? Seems pretty high chances to lead to a system crush... because few people use this method, as far as i know. – DocWiki Dec 09 '10 at 21:57
And there seems to be no way to know whether the wget method is successful or not. Please check this page: `http://linux.byexamples.com/archives/366/python-how-to-run-a-command-line-within-python/` – DocWiki Dec 09 '10 at 22:03
@DocWiki I prefer `curl` myself to `wget`. System calls are always a tricky proposition though. I voted for @Paul's `urlretrieve` answer myself. – chrisaycock Dec 09 '10 at 22:26
This also has the downside (depending on what you're doing) that it requires wget--i.e. it won't work on Windows, unlike a pure Python solution. – Thomas K Dec 09 '10 at 23:37

score 1 · Answer 4 · edited May 23 '17 at 12:02

your current code will read the entire stream into memory before writing to disk. So for instances where the file is larger than your available memory, you will run into problems.

to resolve this, you can read chunks at a time and write them to file.

(copied from Stream large binary files with urllib2 to file)

req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(chunk)

"experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements."

How to Download Files using Python?

4 Answers4