15

Background:

We're building an application that allows our customers to supply data in a predefined (ie. we don't control) XML format. The XSD is supplied to us by a Third Party, and we are expecting to receive an XML file that passes schema validation prior to us processing it.

The Problem:

The XSD that we are supplied with includes a default and target namespace, which means that if a customer supplies an XML file that doesn't include the namespace, then the validation will pass. We obviously don't want them to be supplying things that say they pass but shouldn't, but the bigger concern is around the mass of additional checks that we will need to do on each element if I can't find a solution to doing the XML validation.

The Questions:

Is it possible to force .NET to perform validation and ignore the namespace on the supplied XML and XSD. i.e. in some way "assume" that the namespace was attached.

  1. Is it possible to remove the namespaces in memory, easily, and reliably?
  2. What is the best practice in these situations?

Solutions that I have so far:

  1. Remove the namespace from the XSD everytime it's updated (shouldn't be very often. This doesn't get around the fact that if they supply a namespace it will be still pass validation.
  2. Remove the namespace from the XSD, AND find a way to strip the namespace from the incoming XML everytime. This seems like a lot of code to perform something simple.
  3. Does some pre-qualification on the XML file before it validated to ensure that it has the correct namespace. Seems wrong to fail them due to an invalid namespace if the contents of the file are correct.
  4. Create a duplicate XSD that doesn't have a namespace, however if they just supply the wrong namespace, or a different namespace, then it will still pass.

Example Xml:

<?xml version="1.0"?>
<xsd:schema version='3.09' elementFormDefault='qualified' attributeFormDefault='unqualified' id='blah' targetNamespace='urn:schemas-blah.com:blahExample' xmlns='urn:blah:blahExample' xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
...
</xsd:schema>

with namespace that is different

 <?xml version="1.0" encoding="UTF-8" ?> 
<root xmlns="urn:myCompany.com:blahExample1" attr1="2001-03-03" attr2="google" >
...
</root>

without namespace at all.

 <?xml version="1.0" encoding="UTF-8" ?> 
<root attr1="2001-03-03" attr2="google" >
...
</root>
Martin
  • 2,180
  • 4
  • 21
  • 41
  • XML namespaces are a good thing, why fight it? – M.Babcock Jan 04 '12 at 13:51
  • 2
    it's something that we can't control, I want to make sure that customers are sending the correct XML, however, if a customer misses out the namespace declaration in their submitted XML then I would like to say that we can still validate it. I don't want to just say "You messed up, now fix it!" (and yes I would use better words, but you get the idea). – Martin Jan 04 '12 at 14:18

2 Answers2

6

Trying to solve the same problem. I came up with what I think is a fairly clean solution. For clarity, I have ommited some validation on the input parameters.

First, the scenario: There is a webservice that recieves a file, that is supposed to be "well-formed" xml and valid against a XSD. Of course, we don't trust the "well fomrmness" nor that it is valid against the XSD that "we know" is the correct.

The code for such webservice method is presented below, I think it's self-explanatory.

The main point of interest is the order in wich the validations are happening, you don't check for the namespace before loading, you check after, but cleanly.

I decided I could live with some exception handling, as it's expected that most files will be "good" and because that's the framework way of dealing (so I won't fight it).

private DataTable xmlErrors;
[WebMethod]
public string Upload(byte[] f, string fileName) {
    string ret = "This will have the response";

    // this is the namespace that we want to use
    string xmlNs = "http://mydomain.com/ns/upload.xsd";

    // you could put a public url of xsd instead of a local file
    string xsdFileName = Server.MapPath("~") + "//" +"shiporder.xsd"; 

    // a simple table to store the eventual errors 
    // (more advanced ways possibly exist)
    xmlErrors = new DataTable("XmlErrors");
    xmlErrors.Columns.Add("Type");
    xmlErrors.Columns.Add("Message");

    try {
        XmlDocument doc = new XmlDocument(); // create a document

        // bind the document, namespace and xsd
        doc.Schemas.Add(xmlNs, xsdFileName); 

        // if we wanted to validate if the XSD has itself XML errors
        // doc.Schemas.ValidationEventHandler += 
        // new ValidationEventHandler(Schemas_ValidationEventHandler);

        // Declare the handler that will run on each error found
        ValidationEventHandler xmlValidator = 
            new ValidationEventHandler(Xml_ValidationEventHandler);

        // load the document 
        // will trhow XML.Exception if document is not "well formed"
        doc.Load(new MemoryStream(f));

        // Check if the required namespace is present
        if (doc.DocumentElement.NamespaceURI == xmlNs) {

            // Validate against xsd 
            // will call Xml_ValidationEventHandler on each error found
            doc.Validate(xmlValidator);

            if (xmlErrors.Rows.Count == 0) {
                ret = "OK";
            } else {
                // return the complete error list, this is just to proove it works
                ret = "File has " + xmlErrors.Rows.Count + " xml errors ";
                ret += "when validated against our XSD.";
            }
        } else {
            ret = "The xml document has incorrect or no namespace.";                
        }
    } catch (XmlException ex) {
        ret = "XML Exception: probably xml not well formed... ";
        ret += "Message = " + ex.Message.ToString();
    } catch (Exception ex) {
        ret = "Exception: probably not XML related... "
        ret += "Message = " + ex.Message.ToString();
    }
    return ret;
}

private void Xml_ValidationEventHandler(object sender, ValidationEventArgs e) {
    xmlErrors.Rows.Add(new object[] { e.Severity, e.Message });
}

Now, the xsd would have somthing like:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="shiporder"
    targetNamespace="http://mydomain.com/ns/upload.xsd"
    elementFormDefault="qualified"
    xmlns="http://mydomain.com/ns/upload.xsd"
    xmlns:mstns="http://mydomain.com/ns/upload.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
    <xs:simpleType name="stringtype">
      <xs:restriction base="xs:string"/>
    </xs:simpleType>
    ...
    </xs:schema>

And the "good" XML would be something like:

<?xml version="1.0" encoding="utf-8" ?>
<shiporder orderid="889923"  xmlns="http://mydomain.com/ns/upload.xsd">
  <orderperson>John Smith</orderperson>
  <shipto>
    <names>Ola Nordmann</names>
    <address>Langgt 23</address>

I tested, "bad format XML", "invalid input according to XSD", "incorrect namespace".

references:

Read from memorystream

Trying avoid exception handling checking for wellformness

Validating against XSD, catch the errors

Interesting post about inline schema validation


Hi Martin, the comment sction is too short for my answer, so I'll give it here, it may or not be be a complete answer, let's improve it together :)

I made the following tests:

  • Test: xmlns="blaa"
  • Result: the file gets rejected, because of wrong namespace.
  • Test: xmlns="http://mydomain.com/ns/upload.xsd" and xmlns:a="blaa" and the elements had "a:someElement"
  • Result: The file retunrs error saying it's not expecting "a:someElement"
  • Test: xmlns="http://mydomain.com/ns/upload.xsd" and xmlns:a="blaa" and the elements had "someElement" with some required attribute missing
  • Result: The file returns error saying that the attribute is missing

The strategy followed (wich I prefer) was, if the document doesn't comply, then don't accept, but give some information on the reason (eg. "wrong namespace").

This strategy seems contrary to what you previously said:

however, if a customer misses out the namespace declaration in their submitted XML then I would like to say that we can still validate it. I don't want to just say "You messed up, now fix it!"

In this case, it seems you can just ignore the defined namespace in the XML. To do that you would skip the validation of correct namespace:

    ...
    // Don't Check if the required namespace is present
    //if (doc.DocumentElement.NamespaceURI == xmlNs) {

        // Validate against xsd 
        // will call Xml_ValidationEventHandler on each error found
        doc.Validate(xmlValidator);

        if (xmlErrors.Rows.Count == 0) {
            ret = "OK - is valid against our XSD";
        } else {
            // return the complete error list, this is just to proove it works
            ret = "File has " + xmlErrors.Rows.Count + " xml errors ";
            ret += "when validated against our XSD.";
        }
    //} else {
    //    ret = "The xml document has incorrect or no namespace.";                
    //}
    ...


Other ideas...

In a parallel line of thought, to replace the supplied namespace by your own, maybe you could set doc.DocumentElement.NamespaceURI = "mySpecialNamespace" thus replacing the namepsace of the root element.

Reference:

add-multiple-namespaces-to-the-root-element

Community
  • 1
  • 1
Luís Osório
  • 51
  • 1
  • 7
  • Have you tested this with them providing a namespace that isn't the one you add? and also with the one you add. Additionally, the issue we had is that if they do provide a namespace with a prefix (e.g. xmlns:a="blah"), that we couldn't remove it properly and add our own. – Martin Jun 16 '12 at 20:26
  • 1
    The issue with removing the namespace check is that the validator then won't find anything to validate. If you add a namespace and the nodes have a namespace prefix, then they won't be validated (to my knowledge). I'll have to have a think about how to iterate the elements and remove their prefixes... must be possible... – Martin Jun 25 '12 at 22:11
  • @Martin Sorry, as I'm enforcing the correct ns I failed to see what you noted :). A possible solution could be, start by checking if there is any namespace being used, with `doc.DocumentElement.NamespaceURI`, then, if there is a namespace, remove it as sugested here. [remove-namespace-prefix](http://stackoverflow.com/questions/3052030/how-to-remove-namespace-prefix-c). If you think this could be the way, we could write some code to demonstrate this. – Luís Osório Jun 26 '12 at 11:23
  • I'll take a look tonight and report back. – Martin Jun 26 '12 at 14:49
  • Did you ever try this out @Martin? I'm having this same issue. – Rick Minerich Oct 27 '16 at 21:32
  • unfortunately not I'm afraid, I've moved on since then. – Martin Nov 03 '16 at 21:54
0

The whole point behind a XSD schema is that it makes untyped XML into strongly typed XML.

An XML type can be defined as the combination of node-name and namespace.

If someone sends you XML with no namespace then despite intentions the XML does not refer to the types as defined by the XSD schema.

From a XML validation perspective the XML is valid as long as

  1. It is well formed
  2. It confirms to any typed XML definition, as specified by the xmlns attribute
tom redfern
  • 30,562
  • 14
  • 91
  • 126
  • So best practice says to reject the XML not having the correct (or any) namespace definition. How would I check that the XML that has been received is strongly typed then? – Martin Jan 04 '12 at 14:19
  • You could check the namespace and root node name combination to see if the xml you were being sent was of the correct type. – tom redfern Jan 04 '12 at 14:45
  • is there an elegant way to do that in c#? – Martin Jan 04 '12 at 15:15
  • Here are some ways. None of them are what I would call "elegant" http://www.hanselman.com/blog/GetNamespacesFromAnXMLDocumentWithXPathDocumentAndLINQToXML.aspx – tom redfern Jan 04 '12 at 16:20
  • It just feels dirty, I'll wait to see if anyone has a better idea, but if not, I suppose your answer will need to be it. – Martin Jan 04 '12 at 16:51