0

Given the below two files:

doc.xml

<!DOCTYPE TEST [

    <!ENTITY % get_em SYSTEM "entities.ent" >
    %get_em;

]>

<TEST>
        <COMPANY_ID>&COMPANY_ID;</COMPANY_ID>
</TEST>

entities.ent

<!ENTITY COMPANY_ID "84500">
<!ENTITY SPN_FIRM_ID "5900">
<!ENTITY SPN_CUSTD_REL_ID "40001">
<!ENTITY CUSTD_FIRM_NBR "229">
<!ENTITY CUSTD_FIRM_ID "5901">
<!ENTITY MASTERACCOUNT "TAL">

I can successfully use xmllint:

xmllint --loaddtd --noent --dropdtd doc.xml
<?xml version="1.0"?>
<TEST>
        <COMPANY_ID>84500</COMPANY_ID>
</TEST>

How could I get this idea to work in Perl and XML::Simple?

$ perl -MData::Dumper -MXML::Simple -e 'print Dumper XMLin q{doc.xml}'
doc.xml:4: parser error : PEReference: %get_em; not found
    %get_em;
            ^
doc.xml:9: parser error : Entity 'COMPANY_ID' not defined
        <COMPANY_ID>&COMPANY_ID;</COMPANY_ID>
                                ^

After some comments, I've tried with XML::LibXML::Simple it does look a little better but the entity still does not get resolved

$ perl -MData::Dumper -MXML::LibXML::Simple -e 'print Dumper XMLin q{doc.xml}'
./doc.xml:9: parser error : Entity 'COMPANY_ID' not defined
        <COMPANY_ID>&COMPANY_ID;</COMPANY_ID>
                                ^

Hmm and the PEReference of the above stands out .. what's PE ? But more importantly, how can I get Perl with XML::Simple to read external DTD?

I tired XML::Simple::DTDReader but I find this module very restrictive especially it states specifically that none of XML::Simple's myriad of options are supported!

If I include the ENTITY declarations in the doc.xml itself it DOES work .. so obviously XML::Simple knows how to handle the DOCTYPE only I would like to use external DTD with SYSTEM, and that's where I'm stuck to get it to work.

lzc
  • 919
  • 7
  • 16
  • 4
    Why not use a decent parser instead? – ikegami Dec 30 '15 at 17:02
  • @ikegami The particular XML Configs, I'm dealing with are *simple* ! The only complexity I found with the current setup is there are many copies of same basic XML and just a couple of variables are changing. It the spirit of staying simple, I wish to introduce the concept of a variable by using XML ENTITY refs, and I hope to include them with `SYSTEM` thanks – lzc Dec 30 '15 at 17:10
  • 2
    What does that have to do with my question? If you want something simple, then run, [run far away from XML::Simple](http://stackoverflow.com/q/33267765/589924). – ikegami Dec 30 '15 at 17:16
  • Here is why `XML::Simple` is a bad idea: [Why is XML::Simple discouraged](http://stackoverflow.com/questions/33267765/why-is-xmlsimple-discouraged) - there are NO tasks that are made easier by using it, and plenty that are made MASSIVELY more complicated. – Sobrique Dec 30 '15 at 17:22
  • @Sobrique, your link is excellent, and I am and still am using it to convince my higher-ups, however, I happen to be dealing with a [dissenter](http://stackoverflow.com/a/33285874/3299282) – lzc Dec 30 '15 at 18:23
  • What about `XML::LibXML::Simple` ? I edited the question showing that it does look a little better but does not solve my original intent .. thanks guys – lzc Dec 30 '15 at 18:24

2 Answers2

1

XML::LibXML will expand entities by default, so you can use

$ perl -e'
    use Data::Dumper qw( Dumper );
    use XML::LibXML  qw( );
    use XML::Simple  qw( XMLin );

    my $xml = XML::LibXML->new()->parse_file("doc.xml")->toString();
    my $doc = XMLin($xml);
    print(Dumper($doc));
'
$VAR1 = {
          'COMPANY_ID' => '84500'
        };

This can also be achieved with XML::LibXML::Simple by overriding XML::Simple-compatibility settings.

$ perl -e'
    use Data::Dumper        qw( Dumper );
    use XML::LibXML::Simple qw( XMLin );

    my $doc = XMLin("doc.xml",
        ParserOpts => {
            load_ext_dtd    => 1,
            ext_ent_handler => undef,
        },
    );
    print(Dumper($doc));
'
$VAR1 = {
          'COMPANY_ID' => '84500'
        };
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Nice! it works `perl -MData::Dumper -MXML::LibXML::Simple -e 'print Dumper XMLin (q{doc.xml}, ParserOpts => {load_ext_dtd => 1, ext_ent_handler => undef})'` Thanks – lzc Jan 03 '16 at 17:17
  • Yeah, except you said you had to use XML::Simple... That's the worse of the possible solutions. Not only are you not using XML::Simple, your forcing another parser to produce the same broken output as XML::Simple – ikegami Jan 03 '16 at 21:37
0

I'm still looking if this can be done within Perl itself, but a Simple way would be to combine what I found with xmllint and pass as file handle to XMLin!

$ perl -MData::Dumper -MXML::Simple -e 'open my $fh, "xmllint --loaddtd --noent --dropdtd doc.xml |"; print Dumper XMLin $fh'
$VAR1 = {
          'COMPANY_ID' => '84500'
        };
Community
  • 1
  • 1
lzc
  • 919
  • 7
  • 16