It seems that the other answers have explained what I wrote in my tutorial post. That said I wanted to add that I have come to grips with another useful method in Mojo::DOM (actually in the Mojo::Collection class) called pluck
. This method reduces the visual complexity of
->map(sub{$_->text})
to
->pluck('text')
Further I have noticed that at least a few of my each
calls were extraneous and that a Mojo::Collection used in a list context will "Do What I Mean" and each
automagically.
Edit: I checked this and in fact when used as a string the elements are joined with a newline. As this isn't exactly what I want, I have returned my each
calls.
All that said here is how I might write that same tutorial script now:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.10.0;
use Mojo::DOM;
my $dom = Mojo::DOM->new(<<'HTML');
<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
<div class="timespan">
<div class="openTime">
<div class="days">Mon,Tue,Wed,Thu,Sat</div>
<span class="hours"> 10:00 AM–6:00 PM</span>
</div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Fri</div>
<span class="hours"> 10:00 AM–9:00 PM</span></div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Sun</div>
<span class="hours"> 10:00 AM–5:00 PM</span>
</div>
</div>
</div>
</div>
HTML
say "div days:";
say for $dom->find('div.days')->pluck('text')->each;
say "\nspan hours:";
say for $dom->find('span.hours')->pluck('text')->each;
say "\nOpen Times:";
say for $dom->find('div.openTime')
->map(sub{$_->children->each})
->pluck('text')
->each;
Note that I don't use ->pluck('children')
because the children
method returns a Mojo::Collection object, meaning that the return from pluck
would be a collection of collections. In order to flatten the structure I need to call each
on the result of the children
call and thus I cannot remove that particular ->map
call.
However, now I wonder if I couldn't avoid this hassle all together? Mojo::DOM has excellent support for CSS3 selectors (w3schools reference), and one thing I might try would be not to select the parent (div.openTime
) directly but select its children in the selector.
say "\nOpen Times:";
say for $dom->find('div.openTime > *')->pluck('text')->each;
So there is a good lesson here: allowing the selector to give you as nearly the collection that you want saves you having to transform it later.
To answer your final questions:
To translate this
say for $dom->find('div.openTime')
->map(sub{$_->children->each})
->map(sub{$_->text})
->each;
to more C-esque Perl (though I wont take it to the for(i=0;i<10;i++){ ... }
extreme) it might look something like
my @open_times = $dom->find('div.openTime')->each;
my @all_children;
foreach my $elem ( @open_times ) {
my @children = $elem->children->each;
push @all_children, @children;
}
my @texts;
foreach my $child ( @all_children ) {
push @texts, $child->text;
}
foreach my $text ( @texts ) {
print $text . "\n";
}
I'm sure you can see why I prefer the Mojo (object-chaining) way.
As to your second question: Mojolicious has great (if sometimes oververbose) documentation. Start here to learn about the whole system. Specifically reading about Mojo::DOM and Mojo::Collection should be enough to handle DOM parsing. I think part of your problem is that you didn't notice the interdependency of the DOM and Collection objects and so you mistakenly assumed that all the method calls were on DOM objects. When you read carefully you will see that some of the DOM methods (those that return might more that one result) return Collection objects, and find
is one such method.