22

I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link:

response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)

But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links?

Thanks for any info

Rick
  • 16,612
  • 34
  • 110
  • 163
  • Just realized that I may have had an error in my headers which was preventing the links from working.. thanks to the people who helped I think your answers will work for me and I found another, more straightforward way to do it on another site so I will post that here too for reference once I'm done – Rick Aug 25 '10 at 20:51

4 Answers4

50

br.follow_link takes either a Link object or a keyword arg (such as nr=0).

br.links() lists all the links.

br.links(url_regex='...') lists all the links whose urls matches the regex.

br.links(text_regex='...') lists all the links whose link text matches the regex.

br.follow_link(nr=num) follows the numth link on the page, with counting starting at 0. It returns a response object (the same kind what br.open(...) returns)

br.find_link(url='...') returns the Link object whose url exactly equals the given url.

br.find_link, br.links, br.follow_link, br.click_link all accept the same keywords. Run help(br.find_link) to see documentation on those keywords.

Edit: If you have a target url that you wish to follow, you could do something like this:

import mechanize
br = mechanize.Browser()
response=br.open("http://www.example.com/")
target_url='http://www.rfc-editor.org/rfc/rfc2606.txt'
for link in br.links():
    print(link)
    # Link(base_url='http://www.example.com/', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])
    print(link.url)
    # http://www.rfc-editor.org/rfc/rfc2606.txt
    if link.url == target_url:
        print('match found')
        # match found            
        break

br.follow_link(link)   # link still holds the last value it had in the loop
print(br.geturl())
# http://www.rfc-editor.org/rfc/rfc2606.txt
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • @Rick: If you loop through `br.links()`, you can look at the string `link.url` to figure out if you want to follow it or not. No regex required. – unutbu Aug 25 '10 at 20:25
  • thanks, I think I got it now... i don't know what it is but the versions of python mech that I have (latest ver) doesn't seem to have much in its doc file, not sure why.. anyways, thanks for the help and I think I can get it based on what you said, will try – Rick Aug 25 '10 at 20:30
  • 1
    I still can't figure out how to get a link to match, I am trying to use the regex as the full url but its not giving a match (when I do the for loop it never enters the loop implying it is not getting any matches) – Rick Aug 25 '10 at 20:37
  • @Rick: Regex is tricky. Some characters in your url like `.*+?()[]` all have different meanings in the context of a regex pattern as opposed to plain string comparison. Since you have the full url, you can use `==` to compare the url against `link.url`. I've added some code to show what I mean. – unutbu Aug 25 '10 at 20:53
  • thanks, I have a lot of regex experience I think the issue was that I had a problem in my headers, I appreciate your help and I found another way to do it without using regex so I will post that for reference once I test it – Rick Aug 25 '10 at 21:02
16

I found this way to do it, for reference for anyone who doesn't want to use regex:

r = br.open("http://www.somewebsite.com")
br.find_link(url='http://www.somewebsite.com/link1.html')
req = br.click_link(url='http://www.somewebsite.com/link1.html')
br.open(req)
print br.response().read()

Or, it will work by the link's text also:

r = br.open("http://www.somewebsite.com")
br.find_link(text='Click this link')
req = br.click_link(text='Click this link')
br.open(req)
print br.response().read()
Rick
  • 16,612
  • 34
  • 110
  • 163
  • 2
    I like this solution a lot better than the one I suggested. (I think it even works without the calls to `br.find_link`). Please accept this one so it will bubble to the top. – unutbu Aug 26 '10 at 12:16
2

From looking at the code, I suspect you want

response1 = br.follow_link(link=LinkObjectToFollow)

nr is the same as documented under the find_link call.

EDIT: In my first cursory glance, I didn't realize "link" wasn't a simple link.

jkerian
  • 16,497
  • 3
  • 46
  • 59
  • I found the 'nr' info in the code itself. _mechanize.py in the doctext for find_link... right around line 614 – jkerian Aug 25 '10 at 19:57
  • oh right I didn't even think that they would have a doc file there different from the online version, as I'm used to it also being online, thanks for the tip – Rick Aug 25 '10 at 20:16
2

nr is used for where exactly link you follow. if the text or url you has been regex more than one. default is 0 so if you use default you will follow link first regex at all . for example the source :

<a href="link.html>Click this link</a>
<a href="link2.html>Click this link</a>

in this example we need to follow "Click this link" text but we choose link2.html to follow exactly

br.click_link(text='Click this link', nr=1)

by it you will get link2.html response

Yuda Prawira
  • 12,075
  • 10
  • 46
  • 54