3

I'm trying to follow a link in Perl. My initial code:

use WWW::Mechanize::Firefox;
use Crypt::SSLeay;
use HTML::TagParser;
use URI::Fetch;
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME}=0; #not verifying certificate
my $url = 'https://';
$url = $url.@ARGV[0]; 

my $mech = WWW::Mechanize::Firefox->new;
$mech->get($url);

$mech->follow_link(tag => 'a', text => '<span class=\"normalNode\">VSCs</span>');
$mech->reload();

I found here that the tag and text options work this way but I got the error MozRepl::RemoteObject: SyntaxError: The expression is not a legal expression. I tried to escape some characters in the text, but the error was still the same. Then I changed my code adding:

my @list = $mech->find_all_links();
my $found = 0;
my $i=0;
while($i<=$#list && $found == 0){
    print @list[$i]->url()."\n";
    if(@list[$i]->text() =~ /VSCs/){
    print @list[$i]->text()."\n";
    my $follow =@list[$i]->url();
    $mech->follow_link( url => $follow);
}
    $i++;
}

But then again there's an error: No link found matching '//a[(@href = "https://... and a lot of more text that seems to be the link's description. I hope I made myself clear, if not, please tell me what else to add. Thanks to all for your help.

Here's the part where the link I want to follow is:

<li id="1" class="liClosed"><span class="bullet clickable">&#160;</span><b><a href="/centcfg/vsc_list.asp?entity=allvsc&amp;selector=All"><span class="normalNode">VSCs</span></a></b>
      <ul id="1.l1">
        <li id="i1.i1" class="liBullet"><span class="bullet">&#160;</span><b><a href="/centcfg/vsc_edit.asp?entity=vsc&amp;selector=1"><span class="normalNode">First</span></a></b></li>
        <li id="i1.i2" class="liBullet"><span class="bullet">&#160;</span><b><a href="/centcfg/vsc_edit.asp?entity=vsc&amp;selector=2"><span class="normalNode">Second</span></a></b></li>
        <li id="i1.i3" class="liBullet"><span class="bullet">&#160;</span><b><a href="/centcfg/vsc_edit.asp?entity=vsc&amp;selector=3"><span class="normalNode">Third</span></a></b></li>
        <li id="i1.i4" class="liBullet"><span class="bullet">&#160;</span><b><a href="/centcfg/vsc_edit.asp?entity=vsc&amp;selector=4"><span class="normalNode">Fourth</span></a></b></li>
        <li id="i1.i5" class="liBullet"><span class="bullet">&#160;</span><b><a href="/centcfg/vsc_edit.asp?entity=vsc&amp;selector=5"><span class="normalNode">None</span></a></b></li>
</ul>

I'm working in Windows 7, MozRepl is version 1.1 and I'm using Strawberry perl 5.16.2.1 for 64 bits

Community
  • 1
  • 1
Malincy Montoya
  • 87
  • 1
  • 13
  • I tried recently to install W::M::F, but couldn't get as far as you got. What platform/version are you running? What firefox and mozrepl.xpi versions are you running? What versions of the perl modules (MozRepl, W::M::F, etc) are you running? :-) – David-SkyMesh Dec 19 '12 at 22:57
  • I reedited my post, my W::M::F version I don't know just the one that cpan shell installed :P – Malincy Montoya Dec 20 '12 at 15:55

2 Answers2

2

After poking around with the given code I was able to make W::M::F to follow the links in a following manner:

use WWW::Mechanize::Firefox;
use Crypt::SSLeay;
use HTML::TagParser;
use URI::Fetch;

...

$mech->follow_link(xpath => '//a[text() = "<span class=\"normalNode\">VSCs</span>"]');
$mech->reload();

Note xpath parameter given instead of text.

I didn't take a long look into W::M::F sources, but under the hood it tries to translate given text parameter into XPath string, and if text contains number of XML/HTML tags, which is your case, it probably drives him crazy.

Nikolay A
  • 51
  • 5
  • Well, it follows a link but a stylesheet one. The first link in the HTML actually. Any idea why? I'm a little confused – Malincy Montoya Dec 20 '12 at 16:59
  • Have you tried to add `tag => 'a'` parameter back to a `follow_link()` call? I'm not sure why it didn't recognize XPath query. – Nikolay A Dec 21 '12 at 09:29
  • Btw, just tried to call follow_link() like `$mech->follow_link(tag => 'a', text_regex => qr/VSCs/);` and it led me to the correct link. – Nikolay A Dec 21 '12 at 09:57
  • it's following it, but for some reason, the page stays the same. I'll try a different approach. Thank you for helping me!! – Malincy Montoya Dec 21 '12 at 17:59
0

I recommend you to try :

$mech->follow_link( url_regex => qr/selector=All/ );
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Yes, I did. But I got the error **No link found matching '//a[(text()="VSCs")]'**. I think it's because the text is not _VSCs_ is _VSCs_ – Malincy Montoya Dec 19 '12 at 22:35
  • Can you post a more complete sample HTML ? I see no 'a' tag there. Or better, the real URL – Gilles Quénot Dec 19 '12 at 22:36
  • The real URL it's not possible, I just edited my post. And what do you mean with "I see no 'a' tag there"? – Malincy Montoya Dec 19 '12 at 22:43
  • @Malincy Montoya: the span tag is not text, and I think "VSCs" isn't the a tag's text, it is the span tag's text. – ysth Dec 19 '12 at 22:54
  • 1
    @Maerlyn: looks like an answer to me, just an untested (and in the event, not working) one. Still an answer. – ysth Dec 19 '12 at 22:55
  • @sputnick I tried it and it got stuck, it never ended. I removed qr an got the error **No elements found for Button with name 'url_regex'** I also tried putting the pattern in a variable `my $pat = 'selector=All'; my $found = qr/$pat/; $mech->follow_link( url_regex => /$found/);` But I got the same error – Malincy Montoya Dec 20 '12 at 16:21