Questions tagged [html-tree]

HTML-Tree is a Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

HTML-Tree is the most popular Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

There are a number of other modules that build on top of HTML-Tree. Some notable ones are:

HTML::TreeBuilder::XPath — adds XPath support to HTML::Element.
pQuery — allows jQuery-like queries
WWW::Mechanize — automated web browsing in Perl

33 questions

votes

2 answers

Specify multiple classes in HTML::Element's look_down routine Perl?

I am using HTML::TreeBuilder to parse some HTML. Can you specify multiple classes in the 'look_down' routine? For in stance when searching through HTML using- for ( $tree->look_down( 'class' => 'postbody')) I also was to search for an additional…

asked Jul 13 '11 at 10:49

Ebikeneser

2,582
13
57
111

votes

1 answer

HTML Treebuilder XPath to Extract Links

I am writing a basic script which just extracts all the links from a web page. It is written in Perl and makes use of WWW::Mechanize and HTML::Treebuilder::Xpath modules, both of which I have installed through CPAN. I know it can be easily done…

html perl xpath html-tree

asked Jul 31 '12 at 12:55

Neon Flash

3,113
12
58
96

votes

1 answer

Scrape HTML files with Perl, returning content only, in order

Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes) But this in TreeBuilder my…

perl mojolicious html-tree html-treebuilder mojo-dom

asked Sep 02 '15 at 19:34

sqldoug

votes

3 answers

How to find just the direct descendants with HTML::TreeBuilder?

Suppose I've a HTML tree like this: div `- ul `- li (*) `- li (*) `- li (*) `- li (*) `- ul `- li `- li `- li How do I select the

elements that are marked with…

html perl parsing html-tree

asked Jul 14 '12 at 23:18

bodacydo

75,521
93
229
319

votes

1 answer

Printing table contents using Html::TreeBuilder::XPath

I want to extract all the tables from an html file and print their contents in the following way each cell seperated by \t, each row separated by \n and each table separated by \n\n. The following is my script, when I changed it to findvalues on tr…

html perl html-table html-tree

asked Aug 06 '13 at 13:22

Nishanth Lawrence Reginold

1,551
4
28
43

votes

2 answers

Perl's HTML::Element - dumping just the descendants as HTML

I'm having trouble trying to output the contents of a matched node that I'm parsing:

some text
more text

I'm using HTML::TreeBuilder::XPath to find the node (there's only one div with this class): my…

perl xpath html-parsing html-tree

asked Feb 06 '13 at 13:21

AndyC

2,513
3
17
17

votes

2 answers

How do I visualise/pretty-print a HTML DOM tree?

Now that I can navigate a Web page via WWW::Mechanize and get information via HTML::TreeBuilder::XPath by accessing an id, I am left using Firebug to read the DOM in order to discover the layout of the HTML tree. The content that Mechanize captures…

perl xpath mechanize html-tree

asked Jan 26 '12 at 23:28

Ricalsin

votes

1 answer

How to rearrange html content with HTML::Treebuilder

I'm writing a script to rearrange html content and I'm stuck with 2 problems. I have this html structure, which is movie titles and release years with thumbnails grouped in 5 columns. I want to generate new html files with the movies grouped by…

perl html-parsing html-tree

asked Dec 09 '11 at 21:53

theuserid01

votes

1 answer

memory leak in HTML::TreeBuilder

I have some Perl code: use HTML::Parse; use HTML::FormatText; # ... my $txtFormatter = HTML::FormatText->new(); while ( ... ) { # some condition my $txt = # get from a file my $html_tree = HTML::TreeBuilder->new_from_content($txt); …

windows perl memory-leaks html-tree

asked Aug 05 '10 at 17:13

JoelFan

37,465
35
132
205

votes

1 answer

How to search for text in html-document with Mechanize?

I am using WWW::Mechanize, HTML::TreeBuilder and HTML::Element in my perl-script to navigate through html-Documents. I want to know how to search for an element, that contains a certain string as text. Here is an example of an…

perl mechanize www-mechanize html-tree

asked Jun 08 '15 at 16:06

Hubert Schölnast

8,341
9
39
76

votes

2 answers

Xpath won't fiind id

I'm failing to get a node by its id. The code is straight forward and should be self-explaining. #!/usr/bin/perl use Encode; use utf8; use LWP::UserAgent; use URI::URL; use Data::Dumper; use HTML::TreeBuilder::XPath; my $url =…

perl xpath html-tree html-treebuilder

asked Sep 13 '14 at 16:17

3und80

vote

2 answers

Perl HTML::TreeBuilder adding , and tags to parsed content, how to stop or work around it?

Background: I'm using HTML::TreeBuilder to parse an entire html page, say "whole_page" for reference's sake. I'm then using the inherited parse_content method (same as for whole_page) of a new TreeBuilder object to to parse a chunk of html, say…

html perl parsing html-tree

asked Oct 12 '11 at 17:05

s2cuts

vote

1 answer

Add a UL to a LI (not add a LI to a UL)

I am trying to add a UL inside of a LI. I have an HTML tree that looks like this:

All My Windows

javascript html html-lists html-tree

asked Apr 26 '16 at 23:59

Rob

vote

2 answers

HTML::Tree: Can't call method "as_text" on an undefined value

I am parsing a real estate web page, using HTML::TreeBuilder, and have the following code: $values{"Pcity"} = $address->look_down("_tag" => "span", "itemprop" => "addressLocality")->as_text; $values{"PState"} =…

perl optimization html-parsing html-tree

asked Sep 06 '14 at 20:52

user4035

22,508
11
59
94

vote

1 answer

How does one -- in Perl -- stream a list of URLs from a file into an array to then recursively acquire all of their HTML data in a single file?

Another laborious title... Sorry... Anyway, I've got a file called mash.txt with a bunch of URLs like this in it: http://www... http://www... http://www... . . . So, at this point, I'd like to feed these (URLs) into an array--possibly without having…

perl file stream append html-tree

asked Mar 04 '14 at 00:00

user3333975

2 3 Next