5

When I deserialize an XML document with XmlTextReader, a textual element for which there is no corresponding class is simply ignored.

Note: this is not about elements missing from the XML, which one requires to be present, but rather being present in the XML text, while not having an equivalent property in code.

I would have expected to get an exception because if the respective element is missing from the runtime data and I serialize it later, the resulting XML document will be different from the original one. So it's not safe to ignore it (in my real-world case I have just forgotten to define one of the 99+ classes the given document contains, and I didn't notice at first).

So is this normal and if yes, why? Can I somehow request that I want to get exceptions if elements cannot be serialized?

In the following example-XML I have purposely misspelled "MyComandElement" to illustrate the core problem:

<MyRootElement>
    <MyComandElement/>
</MyRootElement>

MyRootElement.cs:

public class CommandElement {};

public class MyRootElement
{
    public CommandElement MyCommandElement {get; set;}
}

Deserialization:

XmlSerializer xmlSerializer = new XmlSerializer(typeof(MyRootElement));
XmlTextReader xmlReader = new XmlTextReader(@"pgtest.xml");
MyRootElement mbs2 = (MyRootElement)xmlSerializer.Deserialize(xmlReader);
xmlReader.Close();
oliver
  • 2,771
  • 15
  • 32
  • Possible duplicate of [Can I fail to deserialize with XmlSerializer in C# if an element is not found?](https://stackoverflow.com/questions/259726/can-i-fail-to-deserialize-with-xmlserializer-in-c-sharp-if-an-element-is-not-fou) – ProgrammingLlama Apr 19 '18 at 06:31
  • @john: I'll check it... – oliver Apr 19 '18 at 06:31
  • As far as I can see, the linked question is about the reverse case, i.e. the data type (there: int) exists, but the element is missing from the XML (instead of missing from the code). – oliver Apr 19 '18 at 06:36
  • Unless I'm mistaken it should cover both cases. It has been a long time since I've needed to work with XML, however. – ProgrammingLlama Apr 19 '18 at 06:42
  • @john: you mean the question&answer should *cover both cases*, or the deserializer should *cover both cases*? – oliver Apr 19 '18 at 06:44
  • The deserializer with schema verification. – ProgrammingLlama Apr 19 '18 at 06:46
  • @john: I'm afraid I have no clue about how to apply this question to my problem. But I have been able to solve it the most natural way, see my answer. – oliver Apr 19 '18 at 07:34
  • 1
    The way I suggested relies on creating a document which describes what the XML should look like (an xsd). Your way seems easier though :) – ProgrammingLlama Apr 19 '18 at 07:38
  • 1
    It is like this to provide a measure of "forward compatibility" - an application which loads an XML file will be able to load one produced by a new version which has additional data. If you want to handle different versions, you could add a `Version` property to the XML. Or subscribe to the UnknownElement event, as you did. – Matthew Watson Apr 19 '18 at 07:55
  • FYI, according to the [documentation](https://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.aspx) for `XmlTextReader`: *Starting with the .NET Framework 2.0, we recommend that you use the [`System.Xml.XmlReader`](https://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx) class instead.* The accepted solution works with both. – dbc May 23 '18 at 16:48

2 Answers2

8

As I have found out by accident during further research, this problem is actually ridiculously easy to solve because...

...XmlSerializer supports events! All one has to do is to define an event handler for missing elements

void Serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
    throw new Exception("Unknown element "+e.Element.Name+" found in "
        +e.ObjectBeingDeserialized.ToString()+" in line "
        +e.LineNumber+" at position "+e.LinePosition);
}

and register the event with XmlSerializer:

xmlSerializer.UnknownElement += Serializer_UnknownElement;

The topic is treated at MSDN, where one also learns that

By default, after calling the Deserialize method, the XmlSerializer ignores XML attributes of unknown types.

Not surprisingly, there are also events for missing attributes, nodes and objects.

oliver
  • 2,771
  • 15
  • 32
5

So is this normal and if yes, why?

Because maybe you're consuming someone else's XML document and whilst they define 300 different elements within their XML, you only care about two. Should you be forced to create classes for all of their elements and deserialize all of them just to be able to access the two you care about?

Or perhaps you're working with a system that is going to be in flux over time. You're writing code that consumes today's XML and if new elements/attributes are introduced later, they shouldn't stop your tested and deployed code from being able to continue to consume those parts of the XML that they do understand (Insert caveat here that, hopefully, if you're in such a situation, you/the XML author don't introduce elements later which it is critical to understand to cope with the document correctly).

These are two sides of the same coin of why it can be desirable for the system not to blow up if it encounters unexpected parts within the XML document it's being asked to deserialize.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • This makes a lot more sense to me, now that I know that I can also opt for safety by defining above events and throw an exception for example. – oliver Apr 19 '18 at 08:22