Questions tagged [html-treebuilder]

Parser that builds a HTML syntax tree.

The HTML::TreeBuilder is a parser that builds a HTML syntax tree from data or a string.

33 questions
5
votes
2 answers

Perl extract pattern from html file

I have a .html file full of links, I would like to extract the domains without the http:// (so just the hostname portion of the link, e.g blah.com) list them and remove duplicates. This is what I have come up with so far - i think the issue is the…
4
votes
1 answer

Scrape HTML files with Perl, returning content only, in order

Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes) But this in TreeBuilder my…
sqldoug
  • 429
  • 1
  • 3
  • 10
3
votes
2 answers

Use HTML::TreeBuilder in Perl to extract all instances of a specific span class

Trying to make a Perl script to open an HTML file and extract anything contained within tags. Sample HTML: when parsing HTML in Perl
I HAVE SOLVED THIS:Turns out the page I was loading with WWW::Mechanize uses AJAX to load all the content that is inside the
so it is not loaded when I created the $html variable. Now I must see how to get this dynamic content... I am…
AsocPro
  • 62
  • 3
1
vote
1 answer

How to loop the result from findnodes() with HTML::TreeBuilder::XPath

I have my script to monitor some Facebook pages. Since Facebook API banned page public access permission on 4-SEP-2019. I need to parse the content by xpath method. Each Facebook post is wrap by div[contains(@class,"userContentWrapper")]. I would…
1
vote
2 answers

XPath nodes text joined by br

How to join text nodes between br tags again by br. Here is the xml code
text1.
text2.
text3.
ad sense code

text4.
ad sense code

textxx.
I…
daliaessam
  • 1,636
  • 2
  • 21
  • 43
1
vote
1 answer

Perl Mechanize identify content between span tag within specific div tag

Perl WWW::Mechanize::Firefox has successfully retrieved the contents of the web page, and stored in the scalar variable $content. my $url = 'http://finance.yahoo.com/quote/AAPL/financials?p=AAPL'; $mech->get($url); my $content=…
1
vote
2 answers

Extracting Links in Perl using TreeBuilder

I'm working on a script to extract a bunch of information into one HTML file. I'm having some difficulty extracting ONLY a specific set of links from the page in question, however. Here is a rough structure of the site. There are some other headings…
1
vote
1 answer

WWW::Mechanize Extraction Help - PERL

I'm try to automate the extraction of a transcript found on a website. The entire transcript is found between dl tags since the site formatted the interview in a description list. The script I have below allows me to search the site and extract the…
1
vote
0 answers

WebKit - getting an HTML element by postion

Is there a way in WebKit to get an HTML element (from the DOM) by its position? i.e. saying that I have an X,Y coordinates, I would like to "spy" the element behind it. I am looking for a C++ API (in WebKit), and not a javascript way. Thanks.
1
vote
2 answers

Trying to figure out how to push specific links contained in each link of separate list of links into an array

GENERAL IDEA Here is a snippet of what I'm working with: my $url_temp; my $page_temp; my $p_temp; my @temp_stuff; my @collector; foreach (@blarg_links) { $url_temp = $_; $page_temp = get( $url_temp ) or die $!; $p_temp =…
1
vote
1 answer

Combining Class and Nth-Child in Perl TreeBuilder and XPath

I am trying to get the sum of a column in an html table. The first row of this table is all titles. Every cell of every row past the first has the class "right", so I was going to use that class as a selector to ignore the unnecessary titles. …
user2933738
  • 75
  • 1
  • 1
  • 9
1
2 3
>>
Displayname71
  • 142
  • 2
  • 9
3
votes
1 answer

OR match for HTML::TreeBuilder's look_down feature

Trying to match tr items that have a class with either the first three letters starting with eve or day. This is my attempt: my @stuff = $p->look_down( _tag => 'tr', class => 'qr/eve*|day*/g' ); foreach (@stuff) { print…
2
votes
2 answers

Not getting output from HTML::TreeBuilder

I'm trying to get a whole bunch of values from around 3,000 HTML files and save them to a spreadsheet. I'm using HTML::TreeBuilder to process the HTML and creating a spreadsheet using Spreadsheet::WriteExcel. But my script doesn't successfully get…
2
votes
2 answers

Xpath won't fiind id

I'm failing to get a node by its id. The code is straight forward and should be self-explaining. #!/usr/bin/perl use Encode; use utf8; use LWP::UserAgent; use URI::URL; use Data::Dumper; use HTML::TreeBuilder::XPath; my $url =…
3und80
  • 364
  • 6
  • 20
2
votes
1 answer