4

I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.

html (What I currently parse

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

I construct the image name from what I parse:

Current Code

def main(url, output_folder="~/images"):
         """Download the images at url"""
         soup = bs(urlopen(url))
         parsed = list(urlparse.urlparse(url))
         count = 0
         for image in soup.findAll("img"):
             print image
             count += 1
             print count
             print "Image: %(src)s" % image
             image_url = urlparse.urljoin(url, image['src'])
             filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
             parsed[2] = image["src"]
             outpath = os.path.join(output_folder, filename)
             urlretrieve(image_url, outpath)

What I would like to do is extract is

alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

also I want to use alt data as the file name when I extract the image.

add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
  • 2
    You are using `image['src']` to get the source. Can't you just use `image['alt']` to get the alt, or am I misunderstanding your question? – BrtH Jul 27 '12 at 23:14

1 Answers1

11

Inside your for loop, you can obtain that by simply doing

image.get('alt', '')

This is explained in BeautifulSoup's documentation ("The attributes of Tags").

Gonzalo
  • 4,145
  • 2
  • 29
  • 27
  • 2
    key error means that a particular img tag doesn't have an alt attribute. are you sure every image on the page has alt text associated with it? – larissa Jul 27 '12 at 23:31
  • edited answer, it should work for the case @anyaMairead mentions – Gonzalo Jul 27 '12 at 23:33
  • actually some don't have i am trying to avoid those that don't have – add-semi-colons Jul 28 '12 at 00:00
  • @GonzaloDelgado thanks how can i add the alt information as filename..? – add-semi-colons Jul 28 '12 at 00:39
  • depends on how you want the filename to look like, you can just mix it in the filename construct of your sample code, though there's plenty of room for improvement there, I'd say you ask about that at Code Reviews http://codereview.stackexchange.com/ – Gonzalo Jul 28 '12 at 00:46