32

How would I download files (video) with Python using wget and save them locally? There will be a bunch of files, so how do I know that one file is downloaded so as to automatically start downloding another one?

Thanks.

CoreIs
  • 329
  • 1
  • 3
  • 3
  • 3
    How would you do it? First search for all the previous questions exactly like yours: http://stackoverflow.com/questions/tagged/wget+python. Second, read this specific question: http://stackoverflow.com/questions/419235/anyone-know-of-a-good-python-based-web-crawler-that-i-could-use – S.Lott Mar 18 '10 at 10:03

6 Answers6

22

Short answer (simplified). To get one file

 import urllib.request
 urllib.request.urlretrieve("http://google.com/index.html", filename="local/index.html")

You can figure out how to loop that if necessary.

TheGrimmScientist
  • 2,812
  • 1
  • 27
  • 25
Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
21

Don't do this. Use either urllib2 or urlgrabber instead.

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
15

If you use os.system() to spawn a process for the wget, it will block until wget finishes the download (or quits with an error). So, just call os.system('wget blah') in a loop until you've downloaded all of your files.

Alternatively, you can use urllib2 or httplib. You'll have to write a non-trivial amount code, but you'll get better performance, since you can reuse a single HTTP connection to download many files, as opposed to opening a new connection for each file.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • Isn't it `os.system()` is not recommended to be used and as alternative we should use `subprocess`? – alper Dec 29 '21 at 00:18
9

No reason to use os.system. Avoid writing a shell script in Python and go with something like urllib.urlretrieve or an equivalent.

Edit... to answer the second part of your question, you can set up a thread pool using the standard library Queue class. Since you're doing a lot of downloading, the GIL shouldn't be a problem. Generate a list of the URLs you wish to download and feed them to your work queue. It will handle pushing requests to worker threads.

I'm waiting for a database update to complete, so I put this together real quick.


#!/usr/bin/python

import sys
import threading
import urllib
from Queue import Queue
import logging

class Downloader(threading.Thread):
    def __init__(self, queue):
        super(Downloader, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            download_url, save_as = queue.get()
            # sentinal
            if not download_url:
                return
            try:
                urllib.urlretrieve(download_url, filename=save_as)
            except Exception, e:
                logging.warn("error downloading %s: %s" % (download_url, e))

if __name__ == '__main__':
    queue = Queue()
    threads = []
    for i in xrange(5):
        threads.append(Downloader(queue))
        threads[-1].start()

    for line in sys.stdin:
        url = line.strip()
        filename = url.split('/')[-1]
        print "Download %s as %s" % (url, filename)
        queue.put((url, filename))

    # if we get here, stdin has gotten the ^D
    print "Finishing current downloads"
    for i in xrange(5):
        queue.put((None, None))

McJeff
  • 331
  • 1
  • 4
  • 1
    there is a mistake in `download_url, save_as = queue.get()`. should be `download_url, save_as = self.queue.get()`. – disfated Nov 26 '11 at 02:08
1

Install wget via pypi http://pypi.python.org/pypi/wget/0.3

pip install wget

then run, just as documented

python -m wget <url>
BozoJoe
  • 6,117
  • 4
  • 44
  • 66
  • 19
    For anyone else who found this confusing, the linked library doesn't use wget. It uses urllib. And it currently doesn't support anything close to what wget ( http://www.gnu.org/software/wget/ ) does. – Rob Russell Dec 31 '13 at 16:58
-6

No reason to use python. Avoid writing a shell script in Python and go with something like bash or an equivalent.

davr
  • 18,877
  • 17
  • 76
  • 99
  • 3
    Writing a shell script in Python is OK. If you want to get something done quickly but you hate the syntax of bash, just do it in Python. If you make a larger project, then yes, try to avoid these external calls. – Jabba Apr 11 '12 at 07:10
  • 5
    Python is a fine scripting language. – Mark Lakata Nov 15 '12 at 00:00