3

Having this code:

#!/usr/bin/env perl
use 5.014;
use warnings;
use XML::Twig;

my $twig = XML::Twig->parse( \*DATA );
$twig->set_pretty_print('indented_a');

# 1st search
# this prints OK the all <files> nodes where the <type> == 'release'
$_->print for ( $twig->findnodes( '//type[string()="release"]/..' ) );

# 2nd search    
# try to get first matched only
my $latest = $twig->findnodes( '(//type[string()="release"])[1]/..' );
$latest->print;

__DATA__
<root>
    <files>
        <type>beta</type>
        <ver>3.0</ver>
    </files>
    <files>
        <type>alpha</type>
        <ver>3.0</ver>
    </files>
    <files>
        <type>release</type>
        <ver>2.0</ver>
    </files>
    <files>
        <type>release</type>
        <ver>1.0</ver>
    </files>
</root>

The above prints

  <files>
    <type>release</type>
    <ver>2.0</ver>
  </files>
  <files>
    <type>release</type>
    <ver>1.0</ver>
  </files>
error in xpath expression (//type[string()="release"])[1]/.. around (//type[string()="release"])[1]/.. at /opt/anyenv/envs/plenv/versions/5.24.0/lib/perl5/site_perl/5.24.0/XML/Twig.pm line 3648.

The wanted output from the 2nd search

    <files>
        <type>release</type>
        <ver>2.0</ver>
    </files>

e.g. the first <files> node where the <type> eq 'release'.

According to this answer the used XPath expression (//type[string()="release"])[1]/..' should work, but seems I again missed something important.

Could anyone help, please?

Community
  • 1
  • 1
cajwine
  • 3,100
  • 1
  • 20
  • 41

3 Answers3

4

XML::Twig doesn't support the full XPath syntax. The documentation for the get_xpath method (the same as findnodes) says this

A subset of the XPATH abbreviated syntax is covered:

tag
tag[1] (or any other positive number)
tag[last()]
tag[@att] (the attribute exists for the element)
tag[@att="val"]
tag[@att=~ /regexp/]
tag[att1="val1" and att2="val2"]
tag[att1="val1" or att2="val2"]
tag[string()="toto"] (returns tag elements which text (as per the text method) 
                     is toto)
tag[string()=~/regexp/] (returns tag elements which text (as per the text 
                        method) matches regexp)
expressions can start with / (search starts at the document root)
expressions can start with . (search starts at the current element)
// can be used to get all descendants instead of just direct children
* matches any tag

So subexpressions within parentheses aren't supported, and you may specify only a single predicate

It's also important that, in scalar context, findnodes will only ever return a count of the number of nodes found. You must use it in list context to retrieve the nodes themselves, which means that a simpler way to find just the first matching element is to write

my ($latest) = $twig->findnodes( '//type[string()="release"]/..' );

which works fine

If you really need the full power of XPath, then you can use XML::Twig::XPath instead. This module uses either XML::XPath or the excellent XML::XPathEngine to provide the full XPath syntax by overloading findnodes. (The other methods get_xpath and find_nodes continue to use the reduced XML::Twig variation.)

findnodes in scalar context now returns an XML::XPathEngine::NodeSet object which has array indexing overloaded. So you can write

my $latest = $twig->findnodes( '//type[string()="release"]/..' );
$latest->[0]->print;

or just

my ($latest) = $twig->findnodes( '//type[string()="release"]/..' );

as above.

Finally, I would prefer to see /root/files[type[string()="release"]] in preference to the trailing parent::node(), but that is purely personal

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • YES! Using the `XML::Twig::XPath` and the `my ($latest) = $twig->findnodes( '/root/files[type[string()="release"]]' );` solves my needs. Thank you! ;) – cajwine May 18 '16 at 15:42
  • @cajwine: I hope I made it clear that, if you use only one predicate, like `my ($latest) = $twig->findnodes( '/root/files/type[string()="release"]/..' )`, then the standard `XML::Twig` will work fine – Borodin May 18 '16 at 15:51
  • Yes, just tried both. For using the `'/root/files[type[string()="release"]]'` (from your last statement) I need the XPath. And for the `/root/files/type[string()="release"]/..` is enough the simple `XML::Twig`. Wonderful answer! ;) – cajwine May 18 '16 at 16:00
  • @cajwine: I'm pleased to have helped. Like I said, it seems to be a limit on the predicates, but that's just an educated guess. `mirod` posted an answer as well and he is the author of the module so you may want to ask him some questions – Borodin May 18 '16 at 16:06
3

XML::Twig doesn't support all of XPath, but XML::Twig::XPath does.

So use XML::Twig::XPath;, then my $twig = XML::Twig::XPath->parse(... and voilà... you can now get to fixing the $latest=... line, which should be:

my $latest = ($twig->findnodes( '(//type[string()="release"])[1]/..' ))[0];

(the way you have it $latest is an XML::XPathEngine::NodeSet, you need to take the first element of that set).

mirod
  • 15,923
  • 3
  • 45
  • 65
  • This is rather off-topic, but it would be nice if `XML::Twig::XPath` had a way of specifying which helper module to use if they are both installed, in the same way that `Text::CSV` does. Or at least a way of discovering which one had been chosen. It may initially just be a matter of changing `my $XPATH` into `our $XPATH`? – Borodin May 18 '16 at 15:55
  • 1
    No question, just a "thank you" for the nice `XML::Twig` package! :) – cajwine May 18 '16 at 17:06
  • @borodin XML::XPathEngine is used if present. XML::XPath is only an option because historically it was the first one that was used, before I forked the XPath part to create XML::XPathEngine. – mirod May 18 '16 at 21:01
  • @mirod: I get that. Thanks. But unless you're expecting to deprecate `XML::Path` it would still be nice to be able to get at the information. I've been using `ref $twig->{twig_xp}` so far, which isn't very clean – Borodin May 18 '16 at 21:13
2

XML::Twig doesn't support the whole XPath. The expression works correctly in XML::LibXML.

You can walk the structure yourself in Perl:

my $latest = ($twig->findnodes('//type[string()="release"]'))[0]->parent;
choroba
  • 231,213
  • 25
  • 204
  • 289
  • perl-walking - yes - but it says `Can't call method "parent" on an undefined value` if here isn't `release` type (for example just `beta`) - so need test the return value. Therefore i tried to use the (extended) Xpath. Thank you. :) – cajwine May 18 '16 at 15:02