I've looked all around Google and its archives. There are several good articles, but none seem to help me out. So I thought I'd come here for a more specific answer.
The Objective: I want to run this code on a website to get all the picture files at once. It'll save a lot of pointing and clicking.
I've got Python 2.3.5 on a Windows 7 x64 machine. It's installed in C:\Python23.
How do I get this script to "go", so to speak?
=====================================
Seeing as how this is top result on Google, here's a useful link I found over the years:
http://learnpythonthehardway.org/book/ex1.html
For setup, see exercise 0.
=====================================
As requested, here's the code I'm using:
"""
dumpimages.py
Downloads all the images on the supplied URL, and saves them to the
specified output file ("/test/" by default)
Usage:
python dumpimages.py http://example.com/ [output]
"""
from BeautifulSoup import BeautifulSoup as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys
def main(url, out_folder="C:\asdf\"):
"""Downloads all the images at 'url' to /test/"""
soup = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))
for image in soup.findAll("img"):
print "Image: %(src)s" % image
filename = image["src"].split("/")[-1]
parsed[2] = image["src"]
outpath = os.path.join(out_folder, filename)
if image["src"].lower().startswith("http"):
urlretrieve(image["src"], outpath)
else:
urlretrieve(urlparse.urlunparse(parsed), outpath)
def _usage():
print "usage: python dumpimages.py http://example.com [outpath]"
if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
out_folder = sys.argv[-1]
url = sys.argv[-2]
if not url.lower().startswith("http"):
_usage()
sys.exit(-1)
main(url, out_folder)