0

I am using the C# XmlSerializer to deserialize some XML that has an xmlns declaration in one part of it (note that I truncated the CipherValue to fit this post):

<EncryptedData
  Id="ZbjUzHbD37LI2DEuiEGX6A7PSnQ+19dutLPiDxZqnFY=3NLz2QA5KCiXVlJSXejhDQ=="
  Type="http://www.w3.org/2001/04/xmlenc#Element" xmlns="http://www.w3.org/2001/04/xmlenc#"
  length="44">
  <EncryptionMethod
    Algorithm="http://www.w3.org/2001/04/xmlenc#aes256-cbc" />
  <CipherData>
    <CipherValue>3NLz2QA5KCiXVlJSXejhDZYYa9sLbv/w42....+PsLMCfRFN//StgbYoRqno3WQ==</CipherValue>
  </CipherData>
</EncryptedData>

With this XML loaded into Visual Studio, VS highlights the "+" character in the Id attribute, and the existence of the length attribute as being errors. I assume that the only way that VS could know this, is if it went and examined the URLs in the Type and xmlns attributes. VS making this type of Internet request is sort of OK to me, as I have given VS permission to do things like check for updates etc, so I already know that it will be visiting the Internet on its own terms.

However, the above XML doesn't deserialize in my command line program unless I remove the xmlns (or force a blank namespace via a custom XML Text Reader), so I am assuming that my command line program is also verifying the xmlns by visiting that URL.

This is slightly troubling to me, as although I understand what an xmlns URL is, I haven't explicitly given my program permission to go visit the Internet. In addition, the use case of this program is run locally and analyze some XML generated by another local only program. The idea that it could be making Internet requests was way off my radar.

As well as deserializing this XML, I am also doing some XSLT using the c# XslCompiledTransform class. What I finally realized there was that when performing a transform, the xmlns attribute is not something that you can manipulate with XSLT as the transforms are performed on the conceptual data of the XML and not on the raw XML string. Thus the transform has somehow processed the xmlns when reading the XML.

My questions are:

  1. Is the XmlSerializer class making an implicit Internet connection to the xmlns?
  2. Is the XslCompiledTransform class doing something similar?
  3. If there are implicit connections, do they represent a security risk?
  4. And if so, what can be done to mitigate it (aside from forcing a blank namespace)?

As per @canton's request, here is the class definition I'm using for the EncryptedData, as well as the fragment showing where it is referenced

    ...
    [XmlElement("EncryptedData")]
    public EncryptedData EncryptedData { get; set; }
    ...

    public class EncryptedData
    {
        [XmlAttribute("Id")]
        public string Id { get; set; }

        [XmlAttribute("Type")]
        public string Type { get; set; }

        [XmlAttribute("xmlns")]
        public string Xmlns { get; set; }

        [XmlAttribute("length")]
        public int Length { get; set; }

        [XmlElement("EncryptionMethod")]
        public EncryptionMethod EncryptionMethod { get; set; }

        [XmlElement("CipherData")]
        public CipherData CipherData { get; set; }
   }
Peter M
  • 7,309
  • 3
  • 50
  • 91
  • How are you deserializing this? Please post your code. In particular, you're probably putting some attributes putting your models into the right namespace (e.g. `[XmlElement(Namespace = .....)]`) – canton7 Mar 26 '21 at 15:02
  • It's likely VS already knows about `http://www.w3.org/2001/04/xmlenc#`, without needing to hit the network, but I don't know for sure – canton7 Mar 26 '21 at 15:05
  • The namespace will be validated as long as it is reachable, otherwise, it is ignored. for serialize to work the names spaces in the xml and the declared namespace in the c# classes must match. The default if not specified is an empty string. – jdweng Mar 26 '21 at 15:21
  • @jdweng That's incorrect -- there's no such concept of validating a namespace. You validate against an XSD, but even then that's only done if you ask for it – canton7 Mar 26 '21 at 15:40
  • @canton I've added the relevant code. Note that the Type and xmlns strings are only recognized when I force a blank xmlns when deserialization. – Peter M Mar 26 '21 at 15:58
  • @jdweng Hmmm.. are you suggesting that the issue is not decorating the deserialization class with the correct ns rather than the xml not conforming to the definition? – Peter M Mar 26 '21 at 16:01
  • @PeterM You need to put your elements in the right namespace, using the `Namespace` property on the relevant XmlSerializer attributes. – canton7 Mar 26 '21 at 16:02
  • @canton7 So I have an X/Y question then? That not doing the decoration is why I can't deserialize and that the concept of visiting the xmlns URL is just my imagination? Or is it still a valid concern? – Peter M Mar 26 '21 at 16:05
  • Namespaces aren't used for validation. It might be that VS is recognising the namespace and is trying to helpfully highlight things (maybe it's even making network requests as part of that, but I instinctually doubt that), but that won't be affecting XmlSerializer at all. XmlSerializer is failing because you've defined classes in one xml namespace, but trying to use those to deserialize xml with elements in a different namespace – canton7 Mar 26 '21 at 16:13
  • @canton7 That sounds like a real answer, especially if you provide the correct decoration :D – Peter M Mar 26 '21 at 16:14
  • There will be other SO posts about that -- I'll try and find one – canton7 Mar 26 '21 at 16:16
  • Does this help? https://stackoverflow.com/questions/1556874/user-xmlns-was-not-expected-deserializing-twitter-xml – canton7 Mar 26 '21 at 16:29
  • 1
    @canton7 Yes that works – Peter M Mar 26 '21 at 17:40
  • Glad to hear you've sorted it! – canton7 Mar 26 '21 at 17:41

1 Answers1

0

Firstly, a namespace is not a URL! A namespace is just an arbitrary string. The reason that they usually look like URLs is that, If everyone uses a URL under a domain that they own or control then there is no chance of a name clash with anyone else's element of the same name. Even when the namespace is in URL form there is not usually anything actually there and NO tool will try to access it.

Visual Studio maintains a directory of schema files for all Microsoft and standard schemas. You can add your own there if you want intellisense when editing the XML. Check the documentation for the location.

In the absence of an xsd known to VS, the only attribute that is understood by either VS or XmlSerializer is xmlns.

The reason your deserializer wont work is that your elements need to know the namespace. It's easy to miss this as all the examples you will see on the internet tend to not use namespaces.

A good way to go about deserializing is to use xsd.exe to generate classes from either an xsd schema OR a sample data file - The code produces is horrible but can guide you in your prettier version - If you had done this you would have seen that the attributes on the generated code include a namespace e.g.

[XmlElement("EncryptedData", Namespace="http://www.w3.org/2001/04/xmlenc#"]

When creating this manually it is obviously cleaner to put the namespace in a const string.

The rules about what does and does not have a namespace qualifier can be quite confusing hence my recommendation to generate sample code first.

If you actually want schema validation you have to explicitly do it yourself by passing a validating XmlReader to XmlSerializer i.e. validation is, optionally, done as part of reading the XML not as part of the deserialization.