6

I use this perl code to read XML from a file, and then write to another file (my full script has code to add attributes):

#!usr/bin/perl -w

use strict;
use XML::DOM;
use XML::Simple;

my $num_args = $#ARGV + 1;

if ($num_args != 2) {
  print "\nUsage: ModifyXML.pl inputXML outputXML\n";
  exit;
}

my $inputPath = $ARGV[0];
my $outputPath = $ARGV[1];

open(inputXML, "$inputPath") || die "Cannot open $inputPath \n";

my $parser = XML::DOM::Parser->new();
my $data = $parser->parsefile($inputPath) || die "Error parsing XML File";

open my $fh, '>:utf8', "$outputPath" or die "Can't open $outputPath for writing: $!\n";
$data->printToFileHandle($fh);

close(inputXML);

however this doesn't preserve characters like line breaks. For example, this XML:

<?xml version="1.0" encoding="utf-8"?>
<Test>
    <Notification Content="test1     testx &#xD;&#xA;test2&#xD;&#xA;test3&#xD;&#xA;" Type="Test1234">
    </Notification>
</Test>

becomes this:

<?xml version="1.0" encoding="utf-8"?>
<Test>
    <Notification Content="test1     testx 

test2

test3

" Type="Test1234">
    </Notification>
</Test>

I suspect I'm not writing to file properly.

Nikaido
  • 4,443
  • 5
  • 30
  • 47
Warpin
  • 6,971
  • 12
  • 51
  • 77
  • When I think "preserving line breaks" this isn't at all what comes to mind. Here you're looking to preserve *encodings* that coincidentally represent CR/LF characters. – tjd Nov 07 '16 at 19:42
  • 1
    It looks like XML::DOM sets a default handler to expand everything (see DOM.pm lines 2054-58). Have you tried fiddling with that to get the noexpand behavior you want? – mghicks Nov 08 '16 at 17:31
  • that part of XML::DOM doesn't seem to quite work right - thanks for the suggestion though – Warpin Nov 11 '16 at 18:51

2 Answers2

4

Use XML::LibXML, for example. The main modules that get involved are XML::LibXML::Parser and XML::LibXML::DOM (along with others). The returned object is generally XML::LibXML::Document

use warnings 'all';
use strict;

use XML::LibXML;

my $inputPath  = 'with_encodings.xml';
my $outputPath = 'keep_encodings.xml';

my $reader = XML::LibXML->new();
my $doc = $reader->load_xml(location => $inputPath, no_blanks => 1); 

print $doc->toString();

my $state = $doc->toFile($outputPath);

We don't have to first create an object but can directly say XML::LibXML->load_xml. I do it as an example since this way one can then use methods on $reader to set up encodings (for example), before parsing but outside of the constructor.

This module is also far more convenient for processing.

The XML::Twig should also leave encodings, and is also far better for processing.

zdim
  • 64,580
  • 5
  • 52
  • 81
-1

FYI, I was able to do this by switching to different XML parser. Now using XML::LibXML.

Syntax is similar, except it's 'parse_file' instead of 'parsefile', and instead of 'printToFileHandle' you use 'toFile' with a filename.

Warpin
  • 6,971
  • 12
  • 51
  • 77