1

I have a c# script that validates an XML document against an XSD document, as follows:

    static bool IsValidXml(string xmlFilePath, string xsdFilePath)
    {

        XmlReaderSettings settings = new XmlReaderSettings();
        settings.Schemas.Add(null, xsdFilePath);
        settings.ValidationType = ValidationType.Schema;
        settings.Schemas.Compile();

        try
        {
            XmlReader xmlRead = XmlReader.Create(xmlFilePath, settings);
            while (xmlRead.Read())
            { };
            xmlRead.Close();
        }
        catch (Exception e)
        {
            return false;
        }

        return true;
    }

I've compiled this after looking at a number of MSDN articles and questions here where this is the solution. It does correctly validate that the XSD is formed well (returns false if I mess with the file) and checks that the XML is formed well (also returns false when messed with).

I've also tried the following, but it does the exact same thing:

    static bool IsValidXml(string xmlFilePath, string xsdFilePath)
    {

        XDocument xdoc = XDocument.Load(xmlFilePath);
        XmlSchemaSet schemas = new XmlSchemaSet();
        schemas.Add(null, xsdFilePath);

        try
        {
            xdoc.Validate(schemas, null);
        }
        catch (XmlSchemaValidationException e)
        {
            return false;
        }

        return true;
    }

I've even pulled a completely random XSD off the internet and thrown it into both scripts, and it still validates on both. What am I missing here?

Using .NET 3.5 within an SSIS job.

chazbot7
  • 598
  • 3
  • 12
  • 34
  • 1
    You've not provided any specifics, but if you're validating XML with random schemas then this is probably expected. The best you'll get is a warning if the document doesn't have any matching elements in the schema. – Charles Mager Jan 20 '16 at 21:42
  • 4
    Possible duplicate of [Validating XML documents with XSD correctly](http://stackoverflow.com/questions/16755058/validating-xml-documents-with-xsd-correctly) – Charles Mager Jan 20 '16 at 21:42
  • 1
    Check that the namespace in the XML document matches the one being targeted by the schema. Might be helpful to post anexample of the schema and xml files you are trying to validate – SCB Jan 20 '16 at 21:43
  • Possible duplicate of [Xdocument.Validate is always successful](http://stackoverflow.com/questions/17232575/xdocument-validate-is-always-successful) – JerryM Jan 20 '16 at 21:59
  • @CharlesMager Looking at the duplicate you suggested now, that looks promising, thank you. – chazbot7 Jan 20 '16 at 22:22
  • @CharlesMager so if the namespaces on the XSD and the XML don't match, the validation will still pass because there's nothing to compare. Am I understanding that correctly? – chazbot7 Jan 20 '16 at 22:41

1 Answers1

0

In .NET you have to check yourself if the validator actually matches a schema component; if it doesn't, there is no exception thrown, and so your code will not work as you expect.

A match means one or both of the following:

  • there is one global element in your schema set with a qualified name that is the same as your XML document element's qualified name.
  • the document element has an xsi:type attribute, that is a qualified name pointing to a global type in your schema set.

In streaming mode, you can do this check easily. This pseudo-kind-of-code should give you an idea (error handling not shown, etc.):

using (XmlReader reader = XmlReader.Create(xmlfile, settings))
{
    reader.MoveToContent();
    var qn = new XmlQualifiedName(reader.LocalName, reader.NamespaceURI);
    // element test: schemas.GlobalElements.ContainsKey(qn);
    // check if there's an xsi:type attribute: reader["type", XmlSchema.InstanceNamespace] != null;
    // if exists, resolve the value of the xsi:type attribute to an XmlQualifiedName
    // type test: schemas.GlobalTypes.ContainsKey(qn);
    // if all good, keep reading; otherwise, break here after setting your error flag, etc.
}

You might also consider the XmlNode.SchemaInfo which represents the post schema validation infoset that has been assigned to a node as a result of schema validation. I would test different conditions and see how it works for your scenario. The first method is recommended to reduce the attack surface in DoS attacks, as it is the fastest way to detect completely bogus payloads.

Petru Gardea
  • 21,373
  • 2
  • 50
  • 62