2

I'm failing to get a node by its id. The code is straight forward and should be self-explaining.

#!/usr/bin/perl
use Encode; 
use utf8;
use LWP::UserAgent;   
use URI::URL; 
use Data::Dumper;
use HTML::TreeBuilder::XPath;

my $url = 'https://www.airbnb.com/rooms/1976460';
my $browser = LWP::UserAgent->new;
my $resp = $browser->get( $url, 'User-Agent' => 'Mozilla\/5.0' );

if ($resp->is_success) {
    my $base = $resp->base || '';
    print "-> base URL: $base\n";
    my $data = $resp->decoded_content;

    my $tree= HTML::TreeBuilder::XPath->new;
    $tree->parse_content( $resp->decoded_content() );
    binmode STDOUT, ":encoding(UTF-8)";
    my $price_day = $tree->find('.//*[@id="price_amount"]/');
    print Dumper($price_day);

    $tree->delete();
}

The code above prints:

-> base URL: https://www.airbnb.com/rooms/1976460
$VAR1 = undef;

How can I select a node by its ID?

Thanks in advance.

3und80
  • 364
  • 6
  • 20
  • 4
    Offtopic, but the `perl -Mojo -E 'say g("https://www.airbnb.com/rooms/1976460")->dom->find(q{div[id="price_amount"]})->text'` prints `$285`. The Mojo::DOM is an nice module... – clt60 Sep 13 '14 at 16:34
  • Thanks for the hint! I looked further into Mojo and like that it uses CSS-selecetors instead of XPath – 3und80 Sep 16 '14 at 06:20

2 Answers2

2

Take that / off the end of that XPath.

.//*[@id="price_amount"]

should do. As it is, it's not valid XPath.

Flynn1179
  • 11,925
  • 6
  • 38
  • 74
  • 1
    very strange. using perl v5.10 it's not working. but with perl v5.18.2 it is.. maybe this helps others^^ – 3und80 Sep 16 '14 at 06:19
0

There is a trailing slash in your XPath, that you need to remove

my $price_day = $tree->find('.//*[@id="price_amount"]');

However, from my own testing, I believe that HTML::TreeBuilder::XPath is also having trouble parsing that specific URL. Perhaps because of the conditional comments?

As an alternative approach, I would recommend using Mojo::UserAgent and Mojo::DOM instead.

The following uses the css selector div#price_amount to easily find your desired element and print it out.

use strict;
use warnings;

use Mojo::UserAgent;

my $url = 'https://www.airbnb.com/rooms/1976460';
my $dom = Mojo::UserAgent->new->get($url)->res->dom;

my $price_day = $dom->at(q{div#price_amount})->all_text;

print $price_day, "\n";

Outputs:

$285

Note, there is a helpful 8 minute introductory video to this set of modules at Mojocast Episode 5.

Miller
  • 34,962
  • 4
  • 39
  • 60