Using Web::Scraper

Question

Im trying to parse some html tags using perl module Web::Scraper but seems Im an inept using perl. I wonder if anyone can look for mistakes in my code...:

This is my HTML to parse (2 urls inside li tags):

<more html above here>
<div class="span-48 last">
<div class="span-37">
  <div id="zone-extract" class="123">
      <h2 class="genres"></h2>  
                <li><a href="**URL_TO_EXTRACT_1**">1</a></li>
                <li><a class="sel" href="**URL_TO_EXTRACT_2**">2</a></li>
        <li class="first">Pàg</li>
  </div>
</div>      
</div>
<more stuff from here>

Im trying to obtain:

ID:1 Link:URL_TO_EXTRACT_1

ID:2 Link:URL_TO_EXTRACT_2

With this perl code:

my $scraper = scraper {
    process ".zone-extract > a[href]", urls => '@href', id => 'TEXT';
    result 'urls';
};
my $links = $scraper->scrape($response);

This is one of the infinite process combinations I tried, with two different results: An empty return, or all the urls inside code (and I only need links inside zone-extract).

Resolved with mob's contribution... #zone-extract instead .zone-extract :)

In the previous episode: http://stackoverflow.com/a/9821254/46395 — daxim, Mar 22 '12 at 19:20
Isn't `.zone-extract` for elements with the `class="zone-extract"` attribute? For `id="zone-extract"` I'd think you'd want `#zone-extract`, no? — mob, Mar 22 '12 at 19:24

score 2 · Accepted Answer · edited Mar 16 '17 at 20:49

#!/usr/bin/env perl 
use strict;
use warnings;

use Web::Scraper;

my $html = q[
<div class="span-48 last">
<div class="span-37">
<div id="zone-extract" class="123">
<h2 class="genres"></h2>  
<li><a href="**URL_TO_EXTRACT_1**">1</a></li>
<li><a class="sel" href="**URL_TO_EXTRACT_2**">2</a></li>
<li class="first">Pàg</li>
</div>
</div>      
</div>
];      # / (turn off wrong syntax highlighting)

my $parser = scraper {
    process '//div[@id="zone-extract"]//a', 'urls[]' => sub {
        my $url =  $_[0]->attr('href') ;
        return $url;
    };

};

my $ref = $parser->scrape(\$html);

print "$_\n" for @{ $ref->{urls} };

Using Web::Scraper

1 Answers1