9

How do I validate XML document via compact RELAX NG schema in Python?

Jean-Francois T.
  • 11,549
  • 7
  • 68
  • 107
esamatti
  • 18,293
  • 11
  • 75
  • 82
  • possible duplicate of [Validating with an XML schema in Python](http://stackoverflow.com/questions/299588/validating-with-an-xml-schema-in-python) – viam0Zah Nov 26 '10 at 18:01
  • 1
    @TörökGábor that question does not ask about relax ng – oob Apr 15 '12 at 03:40

2 Answers2

18

How about using lxml?

From the docs:

>>> f = StringIO('''\
... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
...  <zeroOrMore>
...     <element name="b">
...       <text />
...     </element>
...  </zeroOrMore>
... </element>
... ''')
>>> relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
Maxim Sloyko
  • 15,176
  • 9
  • 43
  • 49
  • Thanks! Althought lxml does not support compact syntax, it's possible to convert it to xml with Trang http://www.thaiopensource.com/relaxng/trang.html – esamatti Aug 11 '09 at 08:51
  • Is there a way to get useful feedback when the document is not valid, instead of just False? – Mads Skjern Sep 17 '12 at 07:47
  • If anybody comes along and gets to this point and wonders if there is an answer to the previous question, the answer is yes. The list of issues is kept in `relaxng.error_log` (in the context of the code above). – Michael Tiller Mar 17 '13 at 12:19
  • Thanks @MichaelTiller for `error_log` info. Anyway, I have found lxml validation of RelaxNG rather poor - it does not address the failing part of validated document properly (reporting location "xml:1:0" for all errors). It is hard to compare it to reporting provided by `jing`. – Jan Vlcinsky Oct 05 '14 at 18:36
  • 1
    @MaximSloyko Note, that your question is asking (inside the text, not in the title) about validation using **compact** RELAX NG schema, and the answer shows only XML variant, so this is not answering your question. – Jan Vlcinsky May 06 '15 at 08:17
  • From `https://lxml.de/validation.html` "libxml2 does not currently support the RelaxNG Compact Syntax. However, if rnc2rng is installed, lxml 3.6 and later can use it internally to parse the input schema. It recognises the .rnc file extension and also allows parsing an RNC schema from a string using RelaxNG.from_rnc_string()." I couldn't get this to work, though. Any hints? – Ketil Malde Jan 26 '23 at 08:51
2

If you want to check syntax vs Compact RelaxNG Syntax from command line, you can use pyjing, from the jingtrang module.

It supports .rnc files and displays more details than just True or False. For example:

C:\>pyjing -c root.rnc invalid.xml
C:\invalid.xml:9:9: error: element "name" not allowed here; expected the element end-tag or element "bounds"

NOTE: it is a Python wrapper of the Java jingtrang so it requires to have Java installed.

If you want to check the syntax from within Python, you can

  1. Use pytrang (from jingtrang wrapper) to convert "Compact RelaxNG" (.rnc) to XML RelaxNG (.rng): pytrang root.rnc root.rng

  2. Use lxml to parse converted .rng file like this: https://lxml.de/validation.html#relaxng

That would be something like that:

>>> from lxml import etree
>>> from subprocess import call

>>> call("pytrang root.rnc root.rng")

>>> with open("root.rng") as f:
...    relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
Jean-Francois T.
  • 11,549
  • 7
  • 68
  • 107