3

I'm having trouble trying to output the contents of a matched node that I'm parsing:

<div class="description">some text <br/>more text<br/></div>

I'm using HTML::TreeBuilder::XPath to find the node (there's only one div with this class):

my $description = $tree->findnodes('//div[@class="description"]')->[0];

It finds the node (returned as a HTML::Element I believe) but $description->as_HTML includes the element itself too - I just want everything contained inside the element as HTML:

some text <br/>more text<br/>

I can obviously regex strip it out, but that feels messy and I'm sure I'm just missing a function somewhere to do it?

RobEarl
  • 7,862
  • 6
  • 35
  • 50
AndyC
  • 2,513
  • 3
  • 17
  • 17

2 Answers2

0

Try doing this :

my $description = $tree->findnodes('//div[@class="description"]/text()')->[0];

This is a Xpath trick.

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • That returns an object of type HTML::TreeBuilder::XPath::TextNode which doesn't have the 'as_HMTL' method (and I can't seem to find any docs as to what it does provide) – AndyC Feb 06 '13 at 13:35
0

Use ./node() to fetch all subnodes including text and elements.

my $description = $tree->findnodes('//div[@class="description"]/node()');
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • It has the same issue as using text(), the returned object is HTML::TreeBuilder::XPath::TextNode and I'm not sure what to do with it. – AndyC Feb 06 '13 at 14:04
  • This call will return *multiple* nodes (all nodes contained), so it should be a container containing all the elements. It will return some list or a `Tree::XPathEngine::NodeSet` object in scalar mode (what you're forcing it). You'll probably have to iterate over the result in some way. Oh and have a look at the `->[0]` in the end, I guess it's probably wrong here (because you want all nodes, not only the first). I removed it from my answer. – Jens Erat Feb 06 '13 at 15:28
  • Yeah looking at the list returned its a mixture of `HTML::TreeBuilder::XPath::TextNode` and `HTML::Element`, which are lists themselves. It'd be extremely fiddly and annoying just to accomplish what I want, so at this rate I may as well just get rid of the parent tag with regex! – AndyC Feb 06 '13 at 17:55
  • If you'll apply regex, you should be happy with a string anyway? You know [`findnodes_as_string`](http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath-0.14/lib/HTML/TreeBuilder/XPath.pm#findnodes_as_string_($path))? – Jens Erat Feb 06 '13 at 19:10