How to clean an XML file removing all elements not present in a provided XSD?
This does not work:
public static void Main()
{
XmlTextReader xsdReader = new XmlTextReader(@"books.xsd");
XmlSchema schema = XmlSchema.Read(xsdReader, null);
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add(schema);
settings.ValidationType = ValidationType.Schema;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
XmlReader xmlReader = XmlReader.Create(@"books.xml", settings);
XmlWriter xmlWriter = XmlWriter.Create(@"books_clean.xml");
xmlWriter.WriteNode(xmlReader, true);
xmlWriter.Close();
xmlReader.Close();
}
private static void ValidationCallBack(object sender, ValidationEventArgs args)
{
((XmlReader)sender).Skip();
}
When I use the above, instead of removing all "junk" tags, it removes only the first junk tag and leaves the second one. As far as why I need to accept this file, I am using an old SQLServer 2012 instance which requires the XML to match the XSD exactly even if the extra elements in the XML are not used by the application. I do not have control over the source XML which is provided by a 3rd party tool with an unpublished XSD.
Sample Files:
Books.xsd
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="bookstore">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="title"/>
<xs:element type="xs:float" name="price"/>
</xs:sequence>
<xs:attribute type="xs:string" name="genre" use="optional"/>
<xs:attribute type="xs:string" name="ISBN" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Books.xml
<bookstore>
<book genre='novel' ISBN='10-861003-324'>
<title>The Handmaid's Tale</title>
<price>19.95</price>
<junk>skdjgklsdg</junk>
<junk2>skdjgklsdg</junk2>
</book>
<book genre='novel' ISBN='1-861001-57-5'>
<title>Pride And Prejudice</title>
<price>24.95</price>
<junk>skdjgssklsdg</junk>
</book>
</bookstore>
Code mostly copied from: Validating an XML against referenced XSD in C#