4

HI, everyone. I am new to Python and am using Python 2.5 on CentOS.

I need to download files like WGET do.

I have done some search, and there are some solutions, an obvious way is this:

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
output = open('test.mp3','wb')
output.write(mp3file.read())
output.close()

This works fine. But I want to know, if the mp3 file is VERY large, like 1Gb, 2Gb or even bigger. Can this code snippet still work? Are there better ways to download large files in Python, maybe with a progress bar like WGET do.

Thanks a lot!

DocWiki
  • 3,488
  • 10
  • 39
  • 49
  • I assume your question is about iteratively reading and writing a chunk at a time, as opposed to reading the entire file into memory at once only to write it all out to the disk afterwards. – chrisaycock Dec 09 '10 at 21:31
  • 3
    possible duplicate of [Stream large binary files with urllib2 to file](http://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file) – Katriel Dec 09 '10 at 21:31

4 Answers4

16

There's an easier way:

import urllib
urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "/home/download/mp3.mp3")
Paul Schreiber
  • 12,531
  • 4
  • 41
  • 63
3

For really big files, your code would use a lot of memory, since you load the whole file into the memory at once. It might be better to read and write the data in chunks:

from __future__ import with_statement
import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
    while True:
        buf = mp3file.read(65536)
        if not buf:
            break
        output.write(buf)
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

Why not just call wget then?

import os
os.system ("wget http://www.example.com/songs/mp3.mp3")
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
  • Thanks for this. Is this method safe? Seems pretty high chances to lead to a system crush... because few people use this method, as far as i know. – DocWiki Dec 09 '10 at 21:57
  • And there seems to be no way to know whether the wget method is successful or not. Please check this page: `http://linux.byexamples.com/archives/366/python-how-to-run-a-command-line-within-python/` – DocWiki Dec 09 '10 at 22:03
  • @DocWiki I prefer `curl` myself to `wget`. System calls are always a tricky proposition though. I voted for @Paul's `urlretrieve` answer myself. – chrisaycock Dec 09 '10 at 22:26
  • This also has the downside (depending on what you're doing) that it requires wget--i.e. it won't work on Windows, unlike a pure Python solution. – Thomas K Dec 09 '10 at 23:37
1

your current code will read the entire stream into memory before writing to disk. So for instances where the file is larger than your available memory, you will run into problems.

to resolve this, you can read chunks at a time and write them to file.


(copied from Stream large binary files with urllib2 to file)

req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(chunk)

"experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements."

Community
  • 1
  • 1
Corey Goldberg
  • 59,062
  • 28
  • 129
  • 143