2

I'm trying to write a validation script that will validate XML against the NITF DTD, http://www.iptc.org/std/NITF/3.4/specification/dtd/nitf-3-4.dtd. Based on this post I came up with the following simple script to validate a NITF XML document. Bellow is the error message I get when the script is run, which isn't very descriptive and makes it hard to debug. Any help is appreciated.

#!/usr/bin/env python


def main():
    from lxml import etree, objectify
    from StringIO import StringIO

    f = open('nitf_test.xml')
    xml_doc = f.read()
    f.close()

    f = open('nitf-3-4.dtd')
    dtd_doc = f.read()
    f.close()

    dtd = etree.DTD(StringIO(dtd_doc))
    tree = objectify.parse(StringIO(xml_doc))
    dtd.validate(tree)


if __name__ == '__main__':

    main()

Traceback error message:

Traceback (most recent call last):
  File "./test_nitf_doc.py", line 23, in <module>
    main()
  File "./test_nitf_doc.py", line 16, in main
    dtd = etree.DTD(StringIO(dtd_doc))
  File "dtd.pxi", line 43, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:126056)
  File "dtd.pxi", line 117, in lxml.etree._parseDtdFromFilelike (src/lxml/lxml.etree.c:126727)
lxml.etree.DTDParseError: error parsing DTD

If I change the line:

dtd = etree.DTD(StringIO(dtd_doc))

To:

dtd = etree.DTD(dtd_doc)

The error I get is:

lxml.etree.DTDParseError: failed to load external entity "NULL"
Community
  • 1
  • 1
Brent O'Connor
  • 5,692
  • 7
  • 23
  • 27
  • Please post the XML you are trying to validate. – Liza Daly Mar 31 '11 at 01:33
  • This `failed to load external entity "NULL"` error message is quite misleading. It is really because you passed a string to DTD's constructor instead of a file object, but the error message does not help at all in understanding this. – fviktor Jul 13 '11 at 19:24

1 Answers1

5

I took a look at the nitf-3-4.dtd and found that it references an external module xhtml-ruby-1.mod which can be downloaded at this link. This needs to be present in the current directory so the DTD parser can load it.

Full working example (assuming you have a valid NITF document handy):

% wget http://www.iptc.org/std/NITF/3.4/specification/dtd/nitf-3-4.dtd
% wget http://www.iptc.org/std/NITF/3.4/specification/dtd/xhtml-ruby-1.mod

Python code:

from lxml import etree, objectify
dtd = etree.DTD(open('nitf-3-4.dtd', 'rb'))
tree = objectify.parse(open('nitf_test.xml', 'rb'))
print dtd.validate(tree)

Output:

% python nitf_test.py
True
samplebias
  • 37,113
  • 6
  • 107
  • 103