2

I have an example xml

<?xml version="1.0" ?>
<Details date="2022-02-09" ver="1">
<VerNum>/14</VerNum>
<Info>
 <model>S22</model>
 <branch name="city_1">
  <prevstock>10000</prevstock>
  <def>1</def>
 </branch>
 <branch name="city_2">
  <presstock>2000</presstock>
  <def>2</def>
 </branch>
 <branch name="city_3">
  <futstock>3000</futstock>
  <def>0.3</def>
 </branch>
</Info>
</Details>

I need to access the stock, tag name is not always consistent and I can't depend on position of node too, I do not understand the correct usage of XPath / ends-with function.

use warnings;
use strict;
use feature 'say';
use Data::Dumper;
use XML::LibXML;

my $file = "ex.xml";#// die "Usage: $0 filename\n";
my $parser = XML::LibXML->load_xml(location => $file);
my %branch_stock;
foreach my $sec ($parser->findnodes('/Details/Info')) { 
    for my $branch ($sec->findnodes('./branch')) {
        my $branch_name = $branch->getAttribute('name');
        my $stock_value = $branch->findnodes('*[ends-with(name(),"stock")]')->[0]->textContent;

        #say "$branch_name --> $stock_value";

        $branch_stock{$branch_name} = $stock_value;
    }   
}

say Dumper \%branch_stock;

This gives me error,

 error : xmlXPathCompOpEval: function ends-with not found
XPath error : Unregistered function
XPath error : Stack usage errror
 error : xmlXPathCompiledEval: 2 objects left on the stack.

Could anyone please help understand the problem and help overcome please ? Thanks a lot in advance.

Perl_Newbie
  • 101
  • 6
  • 2
    `XML::Xpath` is stuck with XPath1. `fn:ends-with()` is XPath2 – Gilles Quénot Feb 13 '23 at 18:37
  • I didn't get that, XPath1, and XPath2 ? Could you kindly eloborate please ? :) Please suggest a way to correct it. – Perl_Newbie Feb 13 '23 at 18:39
  • 1
    @GillesQuenot I actually have an [open pull request](https://github.com/manwar/XML-XPath/pull/14) against XML::XPath to add `ends-with()` and others but the maintainer hasn't merged it in yet. (But OP's code doesn't use that module) – Shawn Feb 13 '23 at 20:21
  • Would love to have XPath2 with HTML::Tree builder::XPath ! One day maybe. Sad that we are stuck with XPath1. That's why I scrape with nodes and https://pptr.dev – Gilles Quénot Feb 13 '23 at 20:36

2 Answers2

4

Use this XPath1 expression, that mimic the fn:ends-with()¹ function:

//branch/node()[substring(local-name(),
    string-length(local-name()) - string-length("stock") + 1)  = "stock"]

So:

my $stock_value = $branch
->findnodes('node()[substring(local-name(),
    string-length(
        local-name()) - string-length("stock") + 1)  = "stock"]'
)->[0]->textContent;

From wikipedia (sadly...)

There are several versions of XPath in use. XPath 1.0 was published in 1999, XPath 2.0 in 2007 (with a second edition in 2010), XPath 3.0 in 2014, and XPath 3.1 in 2017. However, XPath 1.0 is still the version that is most widely available.


Related: https://stackoverflow.com/a/40935676/465183

¹ https://www.w3.org/TR/xpath-functions-31/#func-ends-with

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
2

That nice ends-with(), along with many other features, is in XPath2 (and later). In XML::LibXML we are limited to XPath 1.0, as the underlying libxml2 is.

One workable function for querying by partial text is contains

use warnings;
use strict;
use feature 'say';

use Data::Dumper;
use XML::LibXML;

my $file = shift // die "Usage: $0 file\n";

my $parser = XML::LibXML->load_xml(location => $file);

my %branch_stock;
for my $sec ($parser->findnodes('/Details/Info')) { 
    for my $branch ($sec->findnodes('./branch')) {
        my $branch_name = $branch->getAttribute('name');

        for my $stock ($branch->findnodes('*[contains(name(),"stock")]')) {
            say "$branch_name --> $stock";
            $branch_stock{$branch_name} = $stock->textContent;

            # No "ends-with" in XPath1, what we have here
            # $branch->findnodes('*[ends-with(name(),"stock")]')
        }
    }   
}
print Dumper \%branch_stock;

This prints

city_1 --> <prevstock>10000</prevstock>
city_2 --> <presstock>2000</presstock>
city_3 --> <futstock>3000</futstock>
$VAR1 = {
          'city_3' => '3000',
          'city_1' => '10000',
          'city_2' => '2000'
        };

A word on sources and documentation, which I find not so easy for XPath.

There is an overview in perl-libxml-by-example. While XPath 1.0 lacks powerful features of later versions, it does have a scoop of functions. One can also create custom functions using Perl API for that. The library, XML::LibXML, uses XML::LibXML::XPathContext.


If there is indeed one stock under each branch, clearly expected here, we don't need a loop but can pick the "first" (only) element

for my $sec ($parser->findnodes('/Details/Info')) { 
    for my $branch ($sec->findnodes('./branch')) {
        my $branch_name = $branch->getAttribute('name');

        my $stock = $branch->findnodes('*[contains(name(),"stock")]')->[0];
        say "$branch_name --> $stock";
        $branch_stock{$branch_name} = $stock->textContent;
     }
}

And there is a shortcut for that, findvalue

for my $sec ($parser->findnodes('/Details/Info')) { 
    for my $branch ($sec->findnodes('./branch')) {
        my $branch_name = $branch->getAttribute('name');

        my $stock_value = $branch->findvalue('*[contains(name(),"stock")]');
        $branch_stock{$branch_name} = $stock_value;
     }
}
zdim
  • 64,580
  • 5
  • 52
  • 81