Python download all files from internet address?

Question

I want to download all files from an internet page, actually all the image files. I found the 'urllib' module to be what I need. There seems to be a method to download a file, if you know the filename, but I don't.

urllib.urlretrieve('http://www.example.com/page', 'myfile.jpg')

Is there a method to download all the files from the page and maybe return a list?

possible duplicate of [Web scraping with Python](http://stackoverflow.com/questions/2081586/web-scraping-with-python) — Mat, Oct 01 '11 at 08:01
Brock123 did you read the link @Mat posted above? It points you toward [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) for scraping the page, which you can use to find all the URLs of the files you then wish to download. — John Keyes, Oct 01 '11 at 10:24

Mark Longair · Accepted Answer · 2011-10-01T11:06:46.660

Here's a little example to get you started with using BeautifulSoup for this kind of exercise - you give this script a URL, and it will print out the URLs of images that are referenced from that page in the src attribute of img tags that end with jpg or png:

import sys, urllib, re, urlparse
from BeautifulSoup import BeautifulSoup

if not len(sys.argv) == 2:
    print >> sys.stderr, "Usage: %s <URL>" % (sys.argv[0],)
    sys.exit(1)

url = sys.argv[1]

f = urllib.urlopen(url)
soup = BeautifulSoup(f)
for i in soup.findAll('img', attrs={'src': re.compile('(?i)(jpg|png)$')}):
    full_url = urlparse.urljoin(url, i['src'])
    print "image URL: ", full_url

Then you can use urllib.urlretrieve to download each of the images pointed to by full_url, but at that stage you have to decide how to name them and what to do with the downloaded images, which isn't specified in your question.

Python download all files from internet address?

1 Answers1