perl XML::LibXML get direct child text node content

Question

Like in this snippet:

<p>content 1 of p <span>content of span</span> content 2 of p </p>

I would like to only obtain the following: content 1 of p and content 2 of p, not content of span.

Is there a way to do it?

Don't listen to @Nilesh Jain; XML::Simple's own documentation warns you against using it for the reason outlined in [Why is XML::Simple "discouraged"?](https://stackoverflow.com/q/33267765/589924). In short, it's an extremely hard to use module. What simply requires `$node->findnodes('text()')` with XML::LibXML would be extremely convoluted and error-prone with XML::Simple. — ikegami, Oct 14 '19 at 09:11
Don't use `XML::Simple`. It had had its place many years ago but its use has been discouraged by its own author, in its own documentation, for many years now. (Go and actually read the _first_ paragraph on the link that @NileshJain supplied.) Its own author wrote a tutorial on another module (`XML::LibXML`, and it's a good tutorial). Use either `XML::LibXML` or `XML::Twig`. — zdim, Oct 14 '19 at 09:41

ikegami · Accepted Answer · 2019-10-14T09:12:43.483

4

Using an XPath:

for my $text_node ($node->findnodes('text()')) {
   say $text_node;
}

Without using an XPath:

for my $child_node ($node->childNodes()) {
   next if $child_node->nodeType != XML_TEXT_NODE;

   say $child_node;
}

Both output the following:

content 1 of p
 content 2 of p

The rest of the program:

use strict;
use warnings;
use feature qw( say );

use XML::LibXML qw( XML_TEXT_NODE );

my $xml = '<p>content 1 of p <span>content of span</span> content 2 of p </p>';

my $doc = XML::LibXML->new->parse_string($xml);
my $node = $doc->documentElement();

edited Oct 14 '19 at 09:12

answered Oct 14 '19 at 09:04

ikegami

367,544
15
269
518

Thanks @ikegami This helps a lot. The XPath approach is neat. – jonah_w Oct 14 '19 at 09:18
1

You can, of course, embed it part of a larger XPath (e.g. `$doc->findnodes('/p/text()')`) – ikegami Oct 14 '19 at 09:19

perl XML::LibXML get direct child text node content

1 Answers1