Questions tagged [html-tree]

HTML-Tree is a Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

HTML-Tree is the most popular Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

There are a number of other modules that build on top of HTML-Tree. Some notable ones are:

33 questions
5
votes
2 answers

Specify multiple classes in HTML::Element's look_down routine Perl?

I am using HTML::TreeBuilder to parse some HTML. Can you specify multiple classes in the 'look_down' routine? For in stance when searching through HTML using- for ( $tree->look_down( 'class' => 'postbody')) I also was to search for an additional…
Ebikeneser
  • 2,582
  • 13
  • 57
  • 111
5
votes
1 answer

HTML Treebuilder XPath to Extract Links

I am writing a basic script which just extracts all the links from a web page. It is written in Perl and makes use of WWW::Mechanize and HTML::Treebuilder::Xpath modules, both of which I have installed through CPAN. I know it can be easily done…
Neon Flash
  • 3,113
  • 12
  • 58
  • 96
4
votes
1 answer

Scrape HTML files with Perl, returning content only, in order

Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes) But this in TreeBuilder my…
sqldoug
  • 429
  • 1
  • 3
  • 10
4
votes
3 answers

How to find just the direct descendants with HTML::TreeBuilder?

Suppose I've a HTML tree like this: div `- ul `- li (*) `- li (*) `- li (*) `- li (*) `- ul `- li `- li `- li How do I select the
  • elements that are marked with…
  • bodacydo
    • 75,521
    • 93
    • 229
    • 319
    3
    votes
    1 answer

    Printing table contents using Html::TreeBuilder::XPath

    I want to extract all the tables from an html file and print their contents in the following way each cell seperated by \t, each row separated by \n and each table separated by \n\n. The following is my script, when I changed it to findvalues on tr…
    3
    votes
    2 answers

    Perl's HTML::Element - dumping just the descendants as HTML

    I'm having trouble trying to output the contents of a matched node that I'm parsing:
    some text
    more text
    I'm using HTML::TreeBuilder::XPath to find the node (there's only one div with this class): my…
    AndyC
    • 2,513
    • 3
    • 17
    • 17
    2
    votes
    2 answers

    How do I visualise/pretty-print a HTML DOM tree?

    Now that I can navigate a Web page via WWW::Mechanize and get information via HTML::TreeBuilder::XPath by accessing an id, I am left using Firebug to read the DOM in order to discover the layout of the HTML tree. The content that Mechanize captures…
    Ricalsin
    • 950
    • 9
    • 28
    2
    votes
    1 answer

    How to rearrange html content with HTML::Treebuilder

    I'm writing a script to rearrange html content and I'm stuck with 2 problems. I have this html structure, which is movie titles and release years with thumbnails grouped in 5 columns. I want to generate new html files with the movies grouped by…
    2
    votes
    1 answer

    memory leak in HTML::TreeBuilder

    I have some Perl code: use HTML::Parse; use HTML::FormatText; # ... my $txtFormatter = HTML::FormatText->new(); while ( ... ) { # some condition my $txt = # get from a file my $html_tree = HTML::TreeBuilder->new_from_content($txt); …
    JoelFan
    • 37,465
    • 35
    • 132
    • 205
    2
    votes
    1 answer

    How to search for text in html-document with Mechanize?

    I am using WWW::Mechanize, HTML::TreeBuilder and HTML::Element in my perl-script to navigate through html-Documents. I want to know how to search for an element, that contains a certain string as text. Here is an example of an…
    Hubert Schölnast
    • 8,341
    • 9
    • 39
    • 76
    2
    votes
    2 answers

    Xpath won't fiind id

    I'm failing to get a node by its id. The code is straight forward and should be self-explaining. #!/usr/bin/perl use Encode; use utf8; use LWP::UserAgent; use URI::URL; use Data::Dumper; use HTML::TreeBuilder::XPath; my $url =…
    3und80
    • 364
    • 6
    • 20
    1
    vote
    2 answers

    Perl HTML::TreeBuilder adding , and tags to parsed content, how to stop or work around it?

    Background: I'm using HTML::TreeBuilder to parse an entire html page, say "whole_page" for reference's sake. I'm then using the inherited parse_content method (same as for whole_page) of a new TreeBuilder object to to parse a chunk of html, say…
    s2cuts
    • 193
    • 2
    • 3
    • 13
    1
    vote
    1 answer

    Add a UL to a LI (not add a LI to a UL)

    I am trying to add a UL inside of a LI. I have an HTML tree that looks like this:
  • All My Windows
  • 1
    vote
    2 answers

    HTML::Tree: Can't call method "as_text" on an undefined value

    I am parsing a real estate web page, using HTML::TreeBuilder, and have the following code: $values{"Pcity"} = $address->look_down("_tag" => "span", "itemprop" => "addressLocality")->as_text; $values{"PState"} =…
    user4035
    • 22,508
    • 11
    • 59
    • 94
    1
    vote
    1 answer

    How does one -- in Perl -- stream a list of URLs from a file into an array to then recursively acquire all of their HTML data in a single file?

    Another laborious title... Sorry... Anyway, I've got a file called mash.txt with a bunch of URLs like this in it: http://www... http://www... http://www... . . . So, at this point, I'd like to feed these (URLs) into an array--possibly without having…
    user3333975
    • 125
    • 10
    1
    2 3