1

How do I click an image like below using Python mechanize?

<a href="..."><img name="next" id="next" src="..."></a>

I know the name and id of the image I want to click to. I need to somehow identify the parent link and click it. How can I?

Bonus Question: How can I check if there is such an image or not?

Chetter Hummin
  • 6,687
  • 8
  • 32
  • 44
yasar
  • 13,158
  • 28
  • 95
  • 160

3 Answers3

5

Rather than using mechanize, it's very simple to do with bs4 (beautifulsoup 4).

from bs4 import BeautifulSoup
import urllib2
text = urllib2.urlopen("http://yourwebpage.com/").read()
soup = BeautifulSoup(text)
img = soup.find_all('img',{'id':'next'})
if img:
    a_tag = img[0].parent
    href = a_tag.get('href')
    print href

Retrieving the parent tag is very easy with bs4, as it happens with nothing less than .parent after finding the tag of course with the find_all function. As the find_all function returns an array, it's best to do if img: in the future, but as this may not apply to your website, it'll be safe to do. See below.

EDIT: I have changed the code to include the "Bonus question", which is what I described above as an alternative.

Hairr
  • 1,088
  • 2
  • 11
  • 19
0

For your bonus question - I would say you can use BeautifulSoup to check to see whether or not the img element works. You can use urllib to see if the image exists (at least, whether or not the server will pass it to you - otherwise you'll get an error back).

You can also check out this thread that someone more intelligent than I answered - it seems to discuss a library called SpiderMonkey and the inability for mechanize to click a button.

Community
  • 1
  • 1
jimf
  • 4,527
  • 1
  • 16
  • 21
0

Well, I don't know how to do it using Mechanize, however I know how to do in using lxml:

Lets assume that our webpage has this code: <a href="page2.html"><img name="bla bla" id="next" src="Cat.jpg"></a>. Using lxml we would write this code:

from lxml import html
page = urlllib2.urlopen('http://example.com')
tree = html.fromstring(page.read())
link = tree.xpath('//img[@id="next"]/ancestor::a/attribute::href')

Most of the magic happens in the tree.xpath function, where you define the image you're looking for first with //img[@id="next"], then you specify that you're looking for the a tag right before it: /ancestor::a and that you're looking for specifically the href attribute: /attribute::href. The link variable will now contain a list of strings matching that query - in this case link[0] will be page2.html - which you can urlopen(), thus effectively clicking it.

For the //img[@id="next"] part, you can use other attribute, for example this: //img[@name="bla bla"] and it's going to work perfectly fine. You just need to think which attribute is better for this situation.

I know this answer doesn't use Mechanize, however I hope it's a helpful pointer. Good luck!

Protagonist
  • 492
  • 1
  • 6
  • 17