Using HTML::TreeBuilder -- or Mojo::DOM -- I'd like to scrape the content but keep it in order, so that I can put the text values into an array (and then replace the text values with a variable for templating purposes)
But this in TreeBuilder
my $map_r = $tree->tagname_map();
my @contents = map { $_->content_list } $tree->find_by_tag_name(keys %$map_r);
foreach my $c (@contents) {
say $c;
}
doesn't return the order -- of course hashes aren't ordered. So, how to visit the tree from root down and keep the sequence of values returned? Recursively walk the tree? Essentially, I'd like to use the method 'as_text' except for each element. (Followed this nice idea but I need it for all elements)
some text now bold extra text
` Should be "some text", "now bold", "extra text" (a quoted array of which is not the problem, that I can handle), rather than "some text", "extra text", "now bold" which Mojo::DOM does with `for my $x ( $dom->parse($html)->find('*')->each ) { my $text = $x->text; chomp $text; push @text, $text; }` – sqldoug Sep 09 '15 at 18:05