0

I am reading XML file using XML::Simple

However, I am facing a rather "odd" situation wherein XML::Simple is behaving inconsistently across hosts

I can best guess that the shell has some role to play - but I can't be sure as I didn't find any such issue documented against XML::Simple

Any pointer will be a great aid in debugging this issue

use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
sub readXml() {

    print "XML::Simple version : $XML::Simple::VERSION\n";

    my ($phRec) = eval {XMLin("sample.xml", ForceArray => 1, KeyAttr => [] )};
    if ( $@ ) {
        print (join '', $@);
        return 0;
    }
    print Dumper($phRec);
    return 1;
}

readXml();

sample.xml

<?xml version="1.0" encoding="utf-8"?>
<node>
    <people name="whatever">etc</people>
    <people name="abc <whatever> pqr">etc</people>
</node>

I understand this is not a valid XML - but I would rather that XML::Simple should fail in both the hosts.

Host1 [Development host]

bin: perl -v

This is perl 5, version 14, subversion 1 (v5.14.1) built for x86_64-linux ...

bin: echo $SHELL

/bin/bash

bin: ./template

XML::Simple version : 2.18
$VAR1 = {
          'people' => [
                      {
                        'content' => 'etc',
                        'name' => 'whatever'
                      },
                      {
                        'content' => 'etc',
                        'name' => 'abc <whatever> pqr'
                      }
                    ]
        };

Host2 [ VM ]

bash-4.1# perl -v

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi...

bash-4.1# echo $SHELL

/bin/csh

bash-4.1# ./template

XML::Simple version : 2.18
sample.xml:4: parser error : Unescaped '<' not allowed in attributes values
    <people name="abc <whatever> pqr">etc</people>
                      ^
sample.xml:4: parser error : attributes construct error
    <people name="abc <whatever> pqr">etc</people>
...
Soumya
  • 885
  • 3
  • 14
  • 29
  • 2
    XML::Simple is the king of inconsistent behaviour. You'll likely have more fun switching to a proper XML parser. If you really want to debug this, you might want to investigate which XML parser XML::Simple is using – it can [interface with multiple parser backends](https://metacpan.org/pod/release/GRANTM/XML-Simple-2.18/lib/XML/Simple.pm#ENVIRONMENT) which might be the cause of this inconsistency. – amon Aug 26 '17 at 13:21
  • 1
    Probably you've got XML::SAX installed on one of the hosts, that might be the reason for differences. – wolfrevokcats Aug 26 '17 at 14:43
  • @amon Thanks Amon - I was completely unaware of the fact that XML::Simple is not a parser on its own.I will definitely switch over to a proper XML parser. Iwent ahead with the debugging the issue and yes - different parsers are used by the hosts. – Soumya Aug 27 '17 at 13:36
  • @wolfrevokcats Upon checking - I could see that the host1 uses XML::SAX::PurePerl and host2 uses XML::LibXML::SAX. However, I could force a consistent behaviour by setting #$PREFERRED_PARSER = 'XML::Parser'; - Thanks. – Soumya Aug 27 '17 at 13:38
  • Re "*I could see that the host1 uses XML::SAX::PurePerl*", Last I checked (a number of years ago), XML::SAX::PurePerl had a major bug related to encodings. (It expected only ASCII or something.) It's also insanely slow. You should edit the config file XML::SAX uses and remove that parser from the list. – ikegami Aug 27 '17 at 18:18
  • Re "*I could force a consistent behaviour by setting #$PREFERRED_PARSER*", You should this chance to stop using [the most complicated of XML parsers](https://stackoverflow.com/q/33267765/589924) instead! – ikegami Aug 27 '17 at 18:22

1 Answers1

2

The XML parser used by XML::Simple on Host1 is apparently more lenient than the one Host2.


XML::Simple doesn't actually parse XML. It delegates that task to XML::Parser or XML::SAX. Even then, the latter itself delegates the parsing to one of many other modules.

Not all of those parsers are of the same quality.

Refer to "Environment" section of XML::Simple's documentation for more info. That section documents a means to select the parser XML::Simple uses. However, you should this chance to stop using XML::Simple! It's so complicated to use its own documentation discourages people from using it!

ikegami
  • 367,544
  • 15
  • 269
  • 518