4

Suppose I've a HTML tree like this:

div
`- ul
   `- li          (*)
   `- li          (*)
   `- li          (*)
   `- li          (*)
      `- ul
         `- li
         `- li
         `- li

How do I select the <li> elements that are marked with (*)? They are direct descendants of the first <ul> element.

Here is how I find the first <ul> element:

my $ul = $div->look_down(_tag => 'ul');

Now I've the $ul, but when I do things like:

my @li_elements = $ul->look_down(_tag => 'li');

It also finds <li> elements that are buried deeper in the HTML tree.

How do I find just the <li> elements that are direct descendants of the first <ul> element? I've an unknown number of them. (I can't just select first 4 as in example).

cjm
  • 61,471
  • 9
  • 126
  • 175
bodacydo
  • 75,521
  • 93
  • 229
  • 319

3 Answers3

8

You can get all the children of an HTML::Element object using the content_list method, so all the child nodes of the first <ul> element in the document would be

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file('my.html');

my @items = $tree->look_down(_tag => 'ul')->content_list;

But it is far more expressive to use HTML::TreeBuilder::XPath, which lets you find all <li> children of <ul> children of <div> elements anywhere in the document, like this

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder->new_from_file('my.html');

my @items = $tree->findnodes('//div/ul/li')->get_nodelist;
Borodin
  • 126,100
  • 9
  • 70
  • 144
5

If you want to use the look_down method you can add an extra criteria to get only the children:

my @li_elements = $ul->look_down(_tag => 'li', sub {$_[0]->parent() == $ul});
Snorri
  • 446
  • 2
  • 10
0

To make this page perfectly complete, I'll add one more option:

@li = grep { $_->tag() eq 'li' } $ul->content_list;

(Where $ul is your top-level element)

Bintz
  • 784
  • 2
  • 9
  • 22