1

Like in this snippet:

<p>content 1 of p <span>content of span</span> content 2 of p </p>

I would like to only obtain the following: content 1 of p and content 2 of p, not content of span.

Is there a way to do it?

jonah_w
  • 972
  • 5
  • 11
  • 2
    Don't listen to @Nilesh Jain; XML::Simple's own documentation warns you against using it for the reason outlined in [Why is XML::Simple "discouraged"?](https://stackoverflow.com/q/33267765/589924). In short, it's an extremely hard to use module. What simply requires `$node->findnodes('text()')` with XML::LibXML would be extremely convoluted and error-prone with XML::Simple. – ikegami Oct 14 '19 at 09:11
  • 2
    Don't use `XML::Simple`. It had had its place many years ago but its use has been discouraged by its own author, in its own documentation, for many years now. (Go and actually read the _first_ paragraph on the link that @NileshJain supplied.) Its own author wrote a tutorial on another module (`XML::LibXML`, and it's a good tutorial). Use either `XML::LibXML` or `XML::Twig`. – zdim Oct 14 '19 at 09:41

1 Answers1

4

Using an XPath:

for my $text_node ($node->findnodes('text()')) {
   say $text_node;
}

Without using an XPath:

for my $child_node ($node->childNodes()) {
   next if $child_node->nodeType != XML_TEXT_NODE;

   say $child_node;
}

Both output the following:

content 1 of p
 content 2 of p

The rest of the program:

use strict;
use warnings;
use feature qw( say );

use XML::LibXML qw( XML_TEXT_NODE );

my $xml = '<p>content 1 of p <span>content of span</span> content 2 of p </p>';

my $doc = XML::LibXML->new->parse_string($xml);
my $node = $doc->documentElement();
ikegami
  • 367,544
  • 15
  • 269
  • 518