Automatic repair of an XML document is not possible in the general case.
In only very limited contexts would the repair necessary to make an XML document valid be automatically discernable from any given validation error. There is not a one-to-one mapping between validation errors and ways of remedying them.
Consider element r
with a
through e
children:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="r">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="a"/>
<xsd:element name="b"/>
<xsd:element name="c"/>
<xsd:element name="d"/>
<xsd:element name="e"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
An XML document such as this one,
<r>
<a/>
<x/>
<b/>
<c/>
<d/>
<e/>
</r>
would yield a validation message such as the following by Xerces-J:
[Error] try.xml:5:7: cvc-complex-type.2.4.a: Invalid content was found
starting with element 'x'. One of '{b}' is expected.
You might here automatically remove x
, and all would be fine. (Or, you might insert a b
, which would not be fine.)
However, for the same XSD, consider that this XML document,
<r>
<a/>
<c/>
<d/>
<e/>
</r>
would yield a validation message such as the following by Xerces-J:
[Error] try.xml:5:7: cvc-complex-type.2.4.a: Invalid content was found
starting with element 'c'. One of '{b}' is expected.
If you automatically removed c
, your document would still be invalid, and you'd receive a similar message about d
being unexpected. This would continue until your document looked like this,
<r>
<a/>
</r>
at which point your error message will have returned to the original,
[Error] try.xml:5:5: cvc-complex-type.2.4.b: The content of element
'r' is not complete. One of '{b}' is expected.
As you can see, there's simply not enough information available in a given validation error to know how to repair the XML document in general.
You could do better by consulting the XSD, but this is extremely complex and still not guaranteed to uniquely determine the exact mistake made by the authoring person or system. Automatic repair of an XML document, even given an XSD, is not possible in the general case.
See also