Questions tagged [mojo-dom]

Minimalistic HTML/XML DOM parser with CSS selectors

Mojo::DOM - Minimalistic HTML/XML DOM parser with CSS selectors

12 questions
5
votes
1 answer

Replace all the spaces in content of any tag with ` `

Task Replace all the spaces in content of any tag with  . y.html (sample file)

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
4
votes
1 answer

Scrape HTML files with Perl, returning content only, in order

Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes) But this in TreeBuilder my…
sqldoug
  • 429
  • 1
  • 3
  • 10
3
votes
2 answers

Using Mojo::DOM to extract untagged text after heading

I'm trying to extract some text without tags from a HTML file using Mojo::DOM (I'm new at this). In particular, the description text after the H2 heading (there are other headings in the file).

Description

This text is the description
3
votes
2 answers

Doctype sniffing with CSS3, and specifically with Mojo::DOM

I can use Mojo::DOM and its CSS3 selectors to figure out the DOCTYPE of an HTML document? Related to my other question, How should I process HTML META tags with Mojo::UserAgent? where I want to set the character set of a document, I need to know…
brian d foy
  • 129,424
  • 31
  • 207
  • 592
2
votes
0 answers

How can I find a specific piece of a webpage using MOJO::Dom?

Below I put an extract from an IMDb page, I purposely kept it short. My end goal is to get the 2 links. But I can't even figure out how to get a specific div with an id. Because obviously the class below is spread out all over the page. I've…
LuisC329
  • 131
  • 8
2
votes
1 answer

Mojo::DOM HTML extraction

I'm trying to extract quite a bit of data from a perfectly structured web page and struggling with Mojo::DOM methods. I would really appreciate it if anyone could point me in the right direction. The truncated HTML with interesting data follows: …
Dvalin Swamp
  • 315
  • 1
  • 2
  • 9
1
vote
0 answers

Perl Compiling PP issue in Strawberry Perl

I am having issues with compiling in Windows (Strawberry Perl v5.32.0) a script that references a custom module. My Perl skills could be rated as a 3/10 with 10 being the best and have researched this problem to the best of my ability. When I run…
1
vote
1 answer

Mojo::DOM extract paragraph after specific previous paragraph

Just using this Mojo::DOM for the first time and having trouble to extract information based on a previous tag. Looking for a way to grab 'The description'. #!/usr/bin/perl require v5.10; use feature qw(say); use Mojo::DOM; my $html =…
Gert
  • 25
  • 5
1
vote
1 answer

Mojo::DOM and Text Method to remove spaces

I have the following code which using Mojo::DOM to get the text my $text =ua->get('https://my_site.org'.$_)->res->dom->at('div.container-fluid h1')->text; while the text under h1 if on the following format :

my_text …

jsor
  • 97
  • 5
1
vote
1 answer

Targeting individual elements in HTML using Perl and Mojo::DOM in well-formated HTML

Relative begginer with Perl, with my first question here, trying the following: I am trying to retrieve certain information from a large online dataset (Eur-Lex), where each HTML document is well-formed HTML, with constant elements. Each HTML file…
Denis_HR
  • 65
  • 1
  • 7
1
vote
2 answers

Mojo::DOM - How to return more than one attribute

I'm new to Mojolicious, to find the title for a link within a p tag with class Module e.g.

Link Text is here

I use the following code: my $dom = Mojo::DOM->new( $page ); for…
Dr.Avalanche
  • 1,944
  • 2
  • 28
  • 37
0
votes
1 answer

CSS selection using Mojo::DOM

This is a multidisciplinary question so the answer may not be purely CSS. I am parsing a large table and my goal is to retrieve only the text outside of the tags. I am able to select the rows but stuck on how to only select text outside of…
Not a machine
  • 508
  • 1
  • 5
  • 21