1

In python 2.7.3, I try to create a script to download a file over the Internet. I use the urllib2 module.

Here, what I have done :

import urllib2

HTTP_client = urllib2.build_opener()
#### Here I can modify HTTP_client headers
URL = 'http://www.google.com'
data = HTTP_client.open(URL)
with open ('file.txt','wb') as f:
        f.write(data.read())

OK. That's work perfectly.

The problem is when I want to save big files (hundreds of MB). I think that when I call the 'open' method, it downloads the file in memory. But, what about large files ? It will not save 1 GB of data in memory !! What happen if i lost connection, all the downloaded part is lost.

How to download large files in Python like wget does ? In wget, it downloads the file 'directly' in hard disk. We can see the file growning up in size.

I'm surprised there is no method 'retrieve' for doing stuff like

HTTP_client.retrieve(URL, 'filetosave.ext')
Boris
  • 41
  • 1
  • 5
  • 1
    Possible duplicate of [Stream large binary files with urllib2 to file](http://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file) – Samuele Mattiuzzo Jan 14 '16 at 16:27

1 Answers1

2

To resolve this, you can read chunks at a time and write them to file.

req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(chunk)
  • 5
    this is an answer copied from another thread, you should have linked it or credited it at least. (http://stackoverflow.com/a/1517728/754484) – Samuele Mattiuzzo Jan 14 '16 at 16:27