3

Simple removeChild test, although the xml line is removed, it maintains an empty blank row, how come? Btw - my source xml file does have indents, however even when I remove them I get the same result. So what's the point of being able to removeChild row if it still retains a blank space?

Is there a way to re-format the resulting xml lines prior to outputing it to the file?

foreach my $XYZ ($doc->findnodes("//EE1"))
{
 my $library = $XYZ->parentNode;
 $library->removeChild($XYZ);
} 
print {$FH} $doc->toString(0);



RESULT IN OUTPUT FILE:
<?xml version="1.0"?>
<TopTag>
  <AA1>ZNY</AA1>
  <AA2>111</AA2>
  <BB1>
    <CC1>ZNY</CC1>
    <CC2>
      <DD1>
                     <-----blank line remains
        <EE2>2000</EE2>
      </DD1>
      <DD1>
                     <-----blank line remains
        <EE2>5000</EE2>
      </DD1>
    </CC2>
  </BB1>
  <AA1>ZNY2</AA1>
  <AA2>2</AA2>
</TopTag>
nwellnhof
  • 32,319
  • 7
  • 89
  • 113
CraigP
  • 453
  • 1
  • 3
  • 17

3 Answers3

5

The empty lines come from text nodes containing whitespace. Consider the following document:

<doc>
  <elem/>
</doc>

The doc element contains the following nodes:

  • A text node containing a newline and two space characters.
  • An element node with the elem element.
  • Another text node containing a newline.

If the elem element is removed, only the text nodes remain resulting in a blank line.

The easiest way to reindent a XML::LibXML document is to use the module XML::LibXML::PrettyPrint. Also have a look at this question.

Community
  • 1
  • 1
nwellnhof
  • 32,319
  • 7
  • 89
  • 113
  • thanks for the response ... Help me out here ... you say "a text node containing a newline and two two space characters" ... exactly where are these? I'm guessing the newline is right after the on the first line and the two space is before the on the second line... but here's my confusion: I used my $parser->keep_blanks(0), and when I write to a file, there's no xtra whitespace - it's just one continuous line (in one row). – CraigP Oct 15 '13 at 14:28
  • Also when I add use XML::LibXML::PrettyPrint; is says not avail. on the target platform (just like the other poster mentions in your link), and I don't/can't add this module to the all the workstations that the script is going to run on. I made use of XML::Twig's pretty_print, but now I have a sceanrio where I'm using two different parsers (XML's vs. Twig). So my next question is ... Can I easily switch between two parsers within the script? – CraigP Oct 15 '13 at 14:34
  • I was able to use Twig, but I had to go about it the long way ... I had to first writeout the file from XML, then open it using Twig's method and using pretty_print => 'indent' ... this worked as far as getting a formatted out file with indents (and removal of empty rows), but it seems convoluted. – CraigP Oct 15 '13 at 14:39
  • If you use `$parser->keep_blanks(0)`, writing the document with `$doc->toString(1)` should work. – nwellnhof Oct 15 '13 at 14:53
  • Ok, Ok, sorry for answering my own post ... So sticking exclusively with LibXML (no Twig stuff) If I add $parser->keep_blanks(0); , when it comes time to printout to file I include print {$FH} $doc->toString(1); will get my disired output, i.e. no blank rows and properly indented (regardless of whitespaces in source file). – CraigP Oct 15 '13 at 14:59
0

Remove newlines that are preceded by another newline (positive look-behind assertion) and optional whitespace in between.

my $output = $doc->toString(0);
$output =~ s/(?<=\n)\s*\n//g;
print {$FH} $output;
Bruce
  • 464
  • 5
  • 9
  • I don't think it is a good idea to change the string by hand. This code for example will also remove non-optional newlines in text nodes. – user2355282 Jul 07 '16 at 14:19
0

You can use the no_blanks option for load_xml() - it will automatically strip any extra whitespace elements when importing your XML:

use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => $filename, no_blanks => 1);

Since the whitespace is removed, you need to then use:

print $dom->toString(1);

to get nicely formatted output.

Silvar
  • 705
  • 3
  • 8