I've been validating xml files against their associated schemas with lxml but I found out that it does not (or appears not to) catch the situation where you have an IDREF type without the matching ID type in the document.
For example:
Attribute "internalRefId" is of type "IDREF" and attribute "id" is of type "ID".
In the schema:
<xs:attribute name="internalRefId" type="xs:IDREF"/>
<xs:attribute name="id" type="xs:ID"/>
When used in xml file:
....See <internalRef internalRefId="T0003" internalRefTargetType="irtt02"/>....
There should be a corresponding element tag with attribute ID="T0003" such as
<table id="T0003">
This is a problem if the there is no id="T0003" anywhere in the xml document and from my understanding that is the purpose of specifying attributes as IDREF/ID types.
And therefore should be caught by an xml validator.
But I can't figure out how to get lxml validator to do this. If I remove, for example, the above table element, lxml validation does not catch this and considers it still valid.
My validation code is similar to the following:
from lxml import etree as ET
tree = ET.parse(filePath)
schema = ET.XMLSchema(ET.parse(somePathtoSchemaXsdFile))
isValid = schema.validate(tree)
- If lxml does verify IDREF/ID matches, then what do I need to do?
- If lxml does not handle this, the reason why would be nice to know, but more importantly what options do I have? The environment is primarily python based and free is preferred but other solutions are on the table.