How do I validate an XML document using
XML::LibXML
when the DTD is available over HTTPS?
Test code:
#!/usr/bin/perl -w
use XML::LibXML;
use strict;
my $xml = XML::LibXML->load_xml(IO => \*DATA);
my $dtd = XML::LibXML::Dtd->new( "-//NLM//DTD LinkOut 1.0//EN", "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" );
my $https_is_valid = $xml->is_valid( $dtd );
print "HTTPS dtd: ", ref $dtd, "\n Is valid: $https_is_valid\n";
my $dtd_http = XML::LibXML::Dtd->new( "-//NLM//DTD LinkOut 1.0//EN", "http://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" );
my $http_is_valid = $xml->is_valid( $dtd_http );
print "HTTP dtd: ", ref $dtd_http, "\n Is valid: $http_is_valid\n";
__DATA__
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" [
<!ENTITY base.url "https://some.domain.com">
<!ENTITY icon.url "https://some.domain.com/logo.png">
]>
<LinkSet>
<Link>
<LinkId>1</LinkId>
<ProviderId>XXXX</ProviderId>
<IconUrl>&icon.url;</IconUrl>
<ObjectSelector>
<Database>PubMed</Database>
<ObjectList>
<ObjId>1234567890</ObjId>
</ObjectList>
</ObjectSelector>
<ObjectUrl>
<Base>&base.url;</Base>
<Rule>/1/</Rule>
</ObjectUrl>
</Link>
</LinkSet>
The code above produces the following output:
HTTPS dtd:
Is valid: 0
HTTP dtd: XML::LibXML::Dtd
Is valid: 1
The DTD fails to load from the HTTPS URL, and therefore cannot be used to validate the XML.
I've downloaded the DTD over HTTPS and checked for HTTP redirects - there aren't any.
I've also had a look at
XML::LibXML::InputCallback
but can't see how I can incorporate it with XML::LibXML::Dtd->new( ... );
.
How should I implement this validation?
The DTD is available over HTTP so I could just use that to validate, but this feels like I'm avoiding the problem rather than solving it properly!