3

In the following code, I am trying to parse an SVG file and delete all text nodes in it. However, it does not work (Code never goes into the forloop for findnodes). What am I doing wrong? I tried with XPath and LibXML version of the code, but none of them worked. They parse and dump the file fine, but the findnodes matches nothing.

#!/usr/bin/perl

use strict;
use warnings;

use XML::XPath;
use XML::XPath::XMLParser;

my $num_args=$#ARGV+1;
if($num_args != 1) { print "Usage: $0 <filename>\n"; exit(1); }


my $file=$ARGV[0];


my $doc = XML::XPath->new(filename => $file);

foreach my $dead ($doc->findnodes('/svg/text')) {
    print "Found Text Node\n";
    $dead->unbindNode;
}

Starting few lines of the SVG file:

<svg
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:svg="http://www.w3.org/2000/svg"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   version="1.1"
   width="675"
   height="832.5"
   id="svg2"
   xml:space="preserve"><metadata
     id="metadata8"><rdf:RDF><cc:Work
         rdf:about=""><dc:format>image/svg+xml</dc:format><dc:type
           rdf:resource="http://purl.org/dc/dcmitype/StillImage" /></cc:Work></rdf:RDF></metadata><defs
     id="defs6" /><g
     transform="matrix(1.25,0,0,-1.25,0,832.5)"
     id="g10"><path
       d="m 54,608.663 450,0 M 54,129.052 l 450,0"
       inkscape:connector-curvature="0"
       id="path12"
       style="fill:none;stroke:#231f20;stroke-width:0.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;stroke-dasharray:none" /><text
       transform="matrix(1,0,0,-1,229.0848,615.9133)"
       id="text14"><tspan

@

workwise
  • 1,003
  • 16
  • 33
  • do you have an example of an input file? – mirod Feb 10 '14 at 12:34
  • because your code works for me (except for the unknown `unbindNode` method in XML::XPath of course), so perhaps the SVG is not what you think it is. – mirod Feb 10 '14 at 13:49
  • would you mind giving us a proper input file? The fragment you included in the question is not well-formed, which makes it impossible to parse with an XML tool. Thanks. – mirod Feb 11 '14 at 08:53

1 Answers1

6

/svg/text looks for text elements directly under the svg root element. That is not what you have here. It looks like what you want is text elements anywhere in the document, which would be //text. This should work with XML::XPath.

If you want to use XML::LibXML, which you should since it is a much better module than XML::XPath (better maintained, more efficient, more powerful), then you have to pay attention to namespaces: the whole document has a default namespace (the xmlns="http://www.w3.org/2000/svg" bit in the opening tag). You will need to declare it and use XML::LibXML::XPathContext to evaluate the XPath expression, including the prefix.:

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;
use XML::LibXML::XPathContext;

# it's easier to test directly @ARGV in scalar context than to use $#ARGV
if(@ARGV != 1) { print "Usage: $0 <filename>\n"; exit(1); }

my $file=$ARGV[0];

my $doc = XML::LibXML->load_xml( location => $file);

my $xpc = XML::LibXML::XPathContext->new( $doc);     # create the XPath evaluator
$xpc->registerNs(x => 'http://www.w3.org/2000/svg'); # declare the namespace as x

# the query now uses x as the prefix for the svg namespace
foreach my $dead ($xpc->findnodes('//x:text')) {
    print "Found Text Node\n";
    $dead->unbindNode;
}
mirod
  • 15,923
  • 3
  • 45
  • 65
  • Thank you Sir! For the solution, and the improvement tips! – workwise Feb 11 '14 at 11:56
  • no problem. namespaces are often a pain when processing XML. They are useful when building generic tools, but for most practical cases of XML munging they get in the way and confuse things. Especially default namespaces. – mirod Feb 11 '14 at 11:59
  • Yes! This has got me started on them, will surely need them often I think. – workwise Feb 12 '14 at 05:57
  • Words cannot describe my hate towards XML namespaces. Thanks a lot for this. – Daniel Kamil Kozar Jul 10 '18 at 10:14