Highest Voted 'html-treebuilder' Questions

5

votes

2 answers

Perl extract pattern from html file

I have a .html file full of links, I would like to extract the domains without the http:// (so just the hostname portion of the link, e.g blah.com) list them and remove duplicates. This is what I have come up with so far - i think the issue is the…

perl uri html-treebuilder

asked Mar 16 '14 at 14:04

user3425810

51
1

4

votes

1 answer

Scrape HTML files with Perl, returning content only, in order

Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes) But this in TreeBuilder my…

perl mojolicious html-tree html-treebuilder mojo-dom

asked Sep 02 '15 at 19:34

sqldoug

429
1
3
10

3

votes

2 answers

Use HTML::TreeBuilder in Perl to extract all instances of a specific span class

Trying to make a Perl script to open an HTML file and extract anything contained within tags. Sample HTML: when parsing HTML in Perl

I HAVE SOLVED THIS:Turns out the page I was loading with WWW::Mechanize uses AJAX to load all the content that is inside the

so it is not loaded when I created the $html variable. Now I must see how to get this dynamic content... I am…

html perl html-tableextract html-treebuilder

asked Feb 10 '14 at 22:18

AsocPro

62
3

1

vote

1 answer

How to loop the result from findnodes() with HTML::TreeBuilder::XPath

I have my script to monitor some Facebook pages. Since Facebook API banned page public access permission on 4-SEP-2019. I need to parse the content by xpath method. Each Facebook post is wrap by div[contains(@class,"userContentWrapper")]. I would…

perl xpath html-treebuilder

asked Sep 05 '19 at 18:08

ต้อง เอกมัย

111
5

1

vote

2 answers

XPath nodes text joined by br

How to join text nodes between br tags again by br. Here is the xml code

text1.
text2.
text3.

ad sense code

text4.

ad sense code

textxx.

I…

dom xpath libxml2 html-treebuilder

asked Aug 29 '19 at 18:07

daliaessam

1,636
2
21
43

1

vote

1 answer

Perl Mechanize identify content between span tag within specific div tag

Perl WWW::Mechanize::Firefox has successfully retrieved the contents of the web page, and stored in the scalar variable $content. my $url = 'http://finance.yahoo.com/quote/AAPL/financials?p=AAPL'; $mech->get($url); my $content=…

perl www-mechanize-firefox html-tableextract html-treebuilder

asked Mar 13 '17 at 18:51

Brian Douglas

39
6

1

vote

2 answers

Extracting Links in Perl using TreeBuilder

I'm working on a script to extract a bunch of information into one HTML file. I'm having some difficulty extracting ONLY a specific set of links from the page in question, however. Here is a rough structure of the site. There are some other headings…

perl mechanize www-mechanize html-content-extraction html-treebuilder

asked Sep 10 '15 at 19:23

MikeEMKI

47
9

1

vote

1 answer

WWW::Mechanize Extraction Help - PERL

I'm try to automate the extraction of a transcript found on a website. The entire transcript is found between dl tags since the site formatted the interview in a description list. The script I have below allows me to search the site and extract the…

perl parsing screen-scraping www-mechanize html-treebuilder

asked Sep 01 '15 at 17:27

MikeEMKI

47
9

1

vote

0 answers

WebKit - getting an HTML element by postion

Is there a way in WebKit to get an HTML element (from the DOM) by its position? i.e. saying that I have an X,Y coordinates, I would like to "spy" the element behind it. I am looking for a C++ API (in WebKit), and not a javascript way. Thanks.

html webkit position html-treebuilder

asked Aug 05 '14 at 13:04

user3910427

11
1

1

vote

2 answers

Trying to figure out how to push specific links contained in each link of separate list of links into an array

GENERAL IDEA Here is a snippet of what I'm working with: my $url_temp; my $page_temp; my $p_temp; my @temp_stuff; my @collector; foreach (@blarg_links) { $url_temp = $_; $page_temp = get( $url_temp ) or die $!; $p_temp =…

arrays perl web-scraping web-crawler html-treebuilder

asked Jun 04 '14 at 19:35

user3707917

23
4

1

vote

1 answer

Combining Class and Nth-Child in Perl TreeBuilder and XPath

I am trying to get the sum of a column in an html table. The first row of this table is all titles. Every cell of every row past the first has the class "right", so I was going to use that class as a selector to ignore the unnecessary titles. …

perl xpath html-treebuilder

asked Mar 10 '14 at 16:25

user2933738

75
1
1
9

1

2 3 Next

>>

…

html perl html-treebuilder

asked Jun 07 '20 at 02:53

Displayname71

142
2
9

3

votes

1 answer

OR match for HTML::TreeBuilder's look_down feature

Trying to match tr items that have a class with either the first three letters starting with eve or day. This is my attempt: my @stuff = $p->look_down( _tag => 'tr', class => 'qr/eve*|day*/g' ); foreach (@stuff) { print…

regex perl html-treebuilder

asked May 30 '14 at 01:16

user3689651

69
3

2

votes

2 answers

Not getting output from HTML::TreeBuilder

I'm trying to get a whole bunch of values from around 3,000 HTML files and save them to a spreadsheet. I'm using HTML::TreeBuilder to process the HTML and creating a spreadsheet using Spreadsheet::WriteExcel. But my script doesn't successfully get…

perl html-treebuilder

asked Apr 03 '17 at 19:31

Ultracrepidarian

51
7

2

votes

2 answers

Xpath won't fiind id

I'm failing to get a node by its id. The code is straight forward and should be self-explaining. #!/usr/bin/perl use Encode; use utf8; use LWP::UserAgent; use URI::URL; use Data::Dumper; use HTML::TreeBuilder::XPath; my $url =…

perl xpath html-tree html-treebuilder

asked Sep 13 '14 at 16:17

3und80

364
6
20

2

votes

1 answer

Questions tagged [html-treebuilder]

Perl extract pattern from html file

Scrape HTML files with Perl, returning content only, in order

Use HTML::TreeBuilder in Perl to extract all instances of a specific span class

How to loop the result from findnodes() with HTML::TreeBuilder::XPath

XPath nodes text joined by br

Perl Mechanize identify content between span tag within specific div tag

Extracting Links in Perl using TreeBuilder

WWW::Mechanize Extraction Help - PERL

WebKit - getting an HTML element by postion

Trying to figure out how to push specific links contained in each link of separate list of links into an array

Combining Class and Nth-Child in Perl TreeBuilder and XPath

OR match for HTML::TreeBuilder's look_down feature

Not getting output from HTML::TreeBuilder

Xpath won't fiind id

Can't get content of