1

In one of my C# projects I use a WCF data contract serializer for serialization to XML. The framework however consists of multiple extension modules that may be loaded or not, dependent on some startup configuration (I use MEF in case it matters). In the future the list of modules may potentially grow and I fear that this situation may someday pose problems with module-specific data. As I understand I can implement a data contract resolver to bidirectionally help the serializer locate types, but what happens if the project contains data it cannot interpret because the associated module is not loaded?

I am looking for a solution that allows me to preserve existing serialized data in cases where not the full set of modules is loaded (or even available). I think of this as a way to tell the de-serializer "if you don't understand what you get, then don't try to serialize it, but please keep the data somewhere so that you can put it back when serializing the next time". I think my problem is related to round-tripping, but I wasn't very successful (yet) in finding a hint on how to deal with such a case where complex types may be added or removed between serialization actions.

Minimal example: Suppose I start my application with the optional modules A, B and C and produce the following XML (AData, BData and CData are in a collection and may be all derived from a common base class):

<Project xmlns="http://schemas.datacontract.org/2004/07/TestApplication" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <Data>
        <ModuleData i:type="AData">
            <A>A</A>
        </ModuleData>
        <ModuleData i:type="BData">
            <B>B</B>
        </ModuleData>
        <ModuleData i:type="CData">
            <C>C</C>
        </ModuleData>
    </Data>
</Project>

In case I skip module C (containing the definition of CData) and load the same project, then the serializer fails because it has no idea how to deal with CData. If I can somehow manage to convince the framework to keep the data and leave it untouched until someone opens the project again with module C, then I win. Of course I could implement dynamic data structures for storing extension data, e.g., key-value trees, but it would be neat to use the existing serialization framework also in extension modules. Any hint on how to achieve this is highly appreciated!

The example code to produce the above output is as follows:

using System;
using System.IO;
using System.Collections.Generic;
using System.Runtime.Serialization;

namespace TestApplication
{
    // common base class
    [DataContract]
    public class ModuleData : IExtensibleDataObject
    {
        public virtual ExtensionDataObject ExtensionData { get; set; }
    }

    [DataContract]
    public class AData : ModuleData
    {
        [DataMember]
        public string A { get; set; }
    }

    [DataContract]
    public class BData : ModuleData
    {
        [DataMember]
        public string B { get; set; }
    }

    [DataContract]
    public class CData : ModuleData
    {
        [DataMember]
        public string C { get; set; }
    }

    [DataContract]
    [KnownType(typeof(AData))]
    [KnownType(typeof(BData))]
    public class Project
    {
        [DataMember]
        public List<ModuleData> Data { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // new project object
            var project1 = new Project()
            {
                Data = new List<ModuleData>()
                {
                    new AData() { A = "A" },
                    new BData() { B = "B" },
                    new CData() { C = "C" }
                }
            };

            // serialization; make CData explicitly known to simulate presence of "module C"
            var stream = new MemoryStream();
            var serializer1 = new DataContractSerializer(typeof(Project), new[] { typeof(CData) });
            serializer1.WriteObject(stream, project1);

            stream.Position = 0;
            var reader = new StreamReader(stream);
            Console.WriteLine(reader.ReadToEnd());

            // deserialization; skip "module C"
            stream.Position = 0;
            var serializer2 = new DataContractSerializer(typeof(Project));
            var project2 = serializer2.ReadObject(stream) as Project;
        }
    }
}

I also uploaded a VS2015 solution here.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Can you give some details as to how you generated that XML initially? When creating XML from a polymorphic list, `DataContractSerializer` uses the [known type](https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/data-contract-known-types) mechanism and produces XML that looks like ` ` – dbc Jul 31 '17 at 10:29
  • Notice the `i:type="AData"`? That's a type hint using the standard [xsi:type](https://www.w3.org/TR/xmlschema-1/#xsi_type) attribute. If, instead, your collection item's element names are changing, that suggests you are really using `XmlSerializer`. Can you confirm? Can you add some code to your question so we can get an idea of what you are doing? – dbc Jul 31 '17 at 10:30
  • Thanks for your reply. My issue is not to get this stuff serialized, provided all modules are present. I have a problem when the types are not known (e.g., if I loaded only a subset of modules), because the serializer throws an exception then. I'll take the time to boil this down to a minimal example and edit my question accordingly. – Christian Waluga Jul 31 '17 at 14:29
  • 1
    If we don't know how you're serializing and deserializing when all modules are present, we may provide non-helpful answers about dealing with the situation when some modules are missing. For instance, if we provide a solution that involves `DataContractSerializer` and you're actually using `XmlSerializer` then the answer will not be usable. – dbc Jul 31 '17 at 18:33
  • 1
    Ah, I see... my XML example was more what I anticipated the serializer would generate than what it actually does. I provided a working example which hopefully clarifies things. Thanks so much for pointing that out! – Christian Waluga Aug 01 '17 at 07:57

2 Answers2

2

Your problem is that you have a polymorphic known type hierarchy, and you would like to use the round-tripping mechanism of DataContractSerializer to read and save "unknown" known types, specifically XML elements with an xsi:type type hint referring to a type not currently loaded into your app domain.

Unfortunately, this use case simply isn't implemented by the round-tripping mechanism. That mechanism is designed to cache unknown data members inside an ExtensionData object, provided that the data contract object itself can be successfully deserialized and implements IExtensibleDataObject. Unfortunately, in your situation the data contract object cannot be constructed precisely because the polymorphic subtype is unrecognized; instead the following exception gets thrown:

System.Runtime.Serialization.SerializationException occurred
Message="Error in line 4 position 6. Element 'http://www.Question45412824.com:ModuleData' contains data of the 'http://www.Question45412824.com:CData' data contract. The deserializer has no knowledge of any type that maps to this contract. Add the type corresponding to 'CData' to the list of known types - for example, by using the KnownTypeAttribute attribute or by adding it to the list of known types passed to DataContractSerializer."

Even if I try to create a custom generic collection marked with [CollectionDataContract] that implements IExtensibleDataObject to cache items with unrecognized contracts, the same exception gets thrown.

One solution is to take advantage of the fact that your problem is slightly less difficult than the round-tripping problem. You (the software architect) actually know all possible polymorphic subtypes. Your software does not, because it isn't always loading the assemblies that contain them. Thus what you can do is load lightweight dummy types instead of the real types when the real types aren't needed. As long as the dummy types implement IExtensibleDataObject and have the same data contract namespace and name and the real types, their data contracts will be interchangeable with the "real" data contracts in polymorphic collections.

Thus, if you define your types as follows, adding a Dummies.CData dummy placeholder:

public static class Namespaces
{
    // The data contract namespace for your project.
    public const string ProjectNamespace = "http://www.Question45412824.com"; 
}

// common base class
[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class ModuleData : IExtensibleDataObject
{
    public ExtensionDataObject ExtensionData { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class AData : ModuleData
{
    [DataMember]
    public string A { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class BData : ModuleData
{
    [DataMember]
    public string B { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
[KnownType(typeof(AData))]
[KnownType(typeof(BData))]
public class Project
{
    [DataMember]
    public List<ModuleData> Data { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class CData : ModuleData
{
    [DataMember]
    public string C { get; set; }
}

namespace Dummies
{
    [DataContract(Namespace = Namespaces.ProjectNamespace)]
    public class CData : ModuleData
    {
    }
}

You will be able to deserialize your Project object using either the "real" CData or the "dummy" version, as shown with the test below:

class Program
{
    static void Main(string[] args)
    {
        new TestClass().Test();
    }
}

class TestClass
{
    public virtual void Test()
    {
        // new project object
        var project1 = new Project()
        {
            Data = new List<ModuleData>()
            {
                new AData() { A = "A" },
                new BData() { B = "B" },
                new CData() { C = "C" }
            }
        };

        // serialization; make CData explicitly known to simulate presence of "module C"
        var extraTypes = new[] { typeof(CData) };
        var extraTypesDummy = new[] { typeof(Dummies.CData) };

        var xml = project1.SerializeXml(extraTypes);

        ConsoleAndDebug.WriteLine(xml);

        // Demonstrate that the XML can be deserialized with the dummy CData type.
        TestDeserialize(project1, xml, extraTypesDummy);

        // Demonstrate that the XML can be deserialized with the real CData type.
        TestDeserialize(project1, xml, extraTypes);

        try
        {
            // Demonstrate that the XML cannot be deserialized without either the dummy or real type.
            TestDeserialize(project1, xml, new Type[0]);
            Assert.IsTrue(false);
        }
        catch (AssertionFailedException ex)
        {
            Console.WriteLine("Caught unexpected exception: ");
            Console.WriteLine(ex);
            throw;
        }
        catch (Exception ex)
        {
            ConsoleAndDebug.WriteLine(string.Format("Caught expected exception: {0}", ex.Message));
        }
    }

    public void TestDeserialize<TProject>(TProject project, string xml, Type[] extraTypes)
    {
        TestDeserialize<TProject>(xml, extraTypes);
    }

    public void TestDeserialize<TProject>(string xml, Type[] extraTypes)
    {
        var project2 = xml.DeserializeXml<TProject>(extraTypes);

        var xml2 = project2.SerializeXml(extraTypes);

        ConsoleAndDebug.WriteLine(xml2);

        // Assert that the incoming and re-serialized XML are equivalent (no data was lost).
        Assert.IsTrue(XNode.DeepEquals(XElement.Parse(xml), XElement.Parse(xml2)));
    }
}

public static partial class DataContractSerializerHelper
{
    public static string SerializeXml<T>(this T obj, Type [] extraTypes)
    {
        return obj.SerializeXml(new DataContractSerializer(obj == null ? typeof(T) : obj.GetType(), extraTypes));
    }

    public static string SerializeXml<T>(this T obj, DataContractSerializer serializer)
    {
        serializer = serializer ?? new DataContractSerializer(obj == null ? typeof(T) : obj.GetType());
        using (var textWriter = new StringWriter())
        {
            var settings = new XmlWriterSettings { Indent = true };
            using (var xmlWriter = XmlWriter.Create(textWriter, settings))
            {
                serializer.WriteObject(xmlWriter, obj);
            }
            return textWriter.ToString();
        }
    }

    public static T DeserializeXml<T>(this string xml, Type[] extraTypes)
    {
        return xml.DeserializeXml<T>(new DataContractSerializer(typeof(T), extraTypes));
    }

    public static T DeserializeXml<T>(this string xml, DataContractSerializer serializer)
    {
        using (var textReader = new StringReader(xml ?? ""))
        using (var xmlReader = XmlReader.Create(textReader))
        {
            return (T)(serializer ?? new DataContractSerializer(typeof(T))).ReadObject(xmlReader);
        }
    }
}

public static class ConsoleAndDebug
{
    public static void WriteLine(object s)
    {
        Console.WriteLine(s);
        Debug.WriteLine(s);
    }
}

public class AssertionFailedException : System.Exception
{
    public AssertionFailedException() : base() { }

    public AssertionFailedException(string s) : base(s) { }
}

public static class Assert
{
    public static void IsTrue(bool value)
    {
        if (value == false)
            throw new AssertionFailedException("failed");
    }
}

Another solution would be to replace your List<ModuleData> with a custom collection that implements IXmlSerializable and handles the polymorphic serialization entirely manually, caching the XML for unknown polymorphic subtypes in a list of unknown elements. I wouldn't recommend that however since even straightforward implementations of IXmlSerializable can be quite complex, as shown here and, e.g., here.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • +1: this really helps me going forward! I however fear that I might run into problems later if I use polymorphic structures in the respective modules. For the top level I could simply "reserve" enough dummy types for the modules and be happy with that, but in case the modules use polymorphic data I guess I would have to have dummies for all of these subtypes as well. I will thus evaluate the second possible solution as well. – Christian Waluga Aug 02 '17 at 08:19
  • @ChristianWaluga - you only need dummies for polymorphic types that are used outside their native assembly. It puts a premium on declaring types as `internal`, which we don't always bother to do. One way to structure your code might be, for each optional module, create two possible DLLs - one with the real external types, and one with the dummy external types. – dbc Aug 02 '17 at 08:49
1

Following dbc's wonderful suggestion of using dummies to exploit the roundtripping mechanism to do the job, I made the solution more generic by generating the dummy types on the fly as needed.

The core of this solution is the following simple function that internally invokes the C# compiler:

private Type CreateDummyType(string typeName, string typeNamespace)
{
    var className = $"DummyClass_{random_.Next()}";
    var code = $"[System.Runtime.Serialization.DataContract(Name=\"{typeName}\", Namespace=\"{typeNamespace}\")] public class {className} : ModuleData {{}}";

    using (var provider = new CSharpCodeProvider())
    {
        var parameters = new CompilerParameters();
        parameters.ReferencedAssemblies.Add("System.Runtime.Serialization.dll");
        parameters.ReferencedAssemblies.Add(GetType().Assembly.Location); // this assembly (for ModuleData)

        var results = provider.CompileAssemblyFromSource(parameters, code);
        return results.CompiledAssembly.GetType(className);
    }
}

I combined this with a DataContractResolver that takes care of any unknown types and generates dummies as needed to preserve their data during subsequent (de)serializations.

For completeness I put the recent iteration of the sample code here:

using System;
using System.IO;
using System.Collections.Generic;
using System.Runtime.Serialization;
using System.Diagnostics;
using System.Xml;
using System.Xml.Linq;
using Microsoft.CSharp;
using System.CodeDom.Compiler;

public static class Namespaces
{
    public const string BaseNamespace = "http://www.Question45412824.com";
    public const string ProjectNamespace = BaseNamespace + "/Project";
    public const string ExtensionNamespace = BaseNamespace + "/Extension";
}

// common base class
[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class ModuleData : IExtensibleDataObject
{
    public ExtensionDataObject ExtensionData { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class AData : ModuleData
{
    [DataMember]
    public string A { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
public class BData : ModuleData
{
    [DataMember]
    public string B { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
[KnownType(typeof(AData))]
[KnownType(typeof(BData))]
public class Project
{
    [DataMember]
    public List<ModuleData> Data { get; set; }
}

[DataContract(Namespace = Namespaces.ProjectNamespace)]
internal class CSubData : ModuleData
{
    [DataMember]
    public string Name { get; set; }
}


[DataContract(Namespace = Namespaces.ExtensionNamespace)]
public class CData : ModuleData
{
    [DataMember]
    public ModuleData C { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        new TestClass().Test();
    }
}

class TestClass
{
    public virtual void Test()
    {
        // new project object
        var project1 = new Project()
        {
            Data = new List<ModuleData>()
                {
                     new AData() { A = "A" },
                     new BData() { B = "B" },
                     new CData() { C = new CSubData() { Name = "C" } }
                }
        };

        // serialization; make CData explicitly known to simulate presence of "module C"
        var extraTypes = new[] { typeof(CData), typeof(CSubData) };

        ConsoleAndDebug.WriteLine("\n== Serialization with all types known ==");
        var xml = project1.SerializeXml(extraTypes);
        ConsoleAndDebug.WriteLine(xml);

        ConsoleAndDebug.WriteLine("\n== Deserialization and subsequent serialization WITH generic resolver and unknown types ==");
        TestDeserialize(project1, xml, new GenericDataContractResolver());

        ConsoleAndDebug.WriteLine("\n== Deserialization and subsequent serialization WITHOUT generic resolver and unknown types ==");
        try
        {
            // Demonstrate that the XML cannot be deserialized without the generic resolver.
            TestDeserialize(project1, xml, new Type[0]);
            Assert.IsTrue(false);
        }
        catch (AssertionFailedException ex)
        {
            Console.WriteLine("Caught unexpected exception: ");
            Console.WriteLine(ex);
            throw;
        }
        catch (Exception ex)
        {
            ConsoleAndDebug.WriteLine(string.Format("Caught expected exception: {0}", ex.Message));
        }
    }

    public void TestDeserialize<TProject>(TProject project, string xml, Type[] extraTypes)
    {
        TestDeserialize<TProject>(xml, extraTypes);
    }

    public void TestDeserialize<TProject>(string xml, Type[] extraTypes)
    {
        var project2 = xml.DeserializeXml<TProject>(extraTypes);

        var xml2 = project2.SerializeXml(extraTypes);

        ConsoleAndDebug.WriteLine(xml2);

        // Assert that the incoming and re-serialized XML are equivalent (no data was lost).
        Assert.IsTrue(XNode.DeepEquals(XElement.Parse(xml), XElement.Parse(xml2)));
    }

    public void TestDeserialize<TProject>(TProject project, string xml, DataContractResolver resolver)
    {
        TestDeserialize<TProject>(xml, resolver);
    }

    public void TestDeserialize<TProject>(string xml, DataContractResolver resolver)
    {
        var project2 = xml.DeserializeXml<TProject>(resolver);

        var xml2 = project2.SerializeXml(resolver);

        ConsoleAndDebug.WriteLine(xml2);

        // Assert that the incoming and re-serialized XML are equivalent (no data was lost).
        Assert.IsTrue(XNode.DeepEquals(XElement.Parse(xml), XElement.Parse(xml2)));
    }
}

public static partial class DataContractSerializerHelper
{
    public static string SerializeXml<T>(this T obj, Type[] extraTypes)
    {
        return obj.SerializeXml(new DataContractSerializer(obj == null ? typeof(T) : obj.GetType(), extraTypes));
    }

    public static string SerializeXml<T>(this T obj, DataContractResolver resolver)
    {
        return obj.SerializeXml(new DataContractSerializer(obj == null ? typeof(T) : obj.GetType(), null, int.MaxValue, false, false, null, resolver));
    }

    public static string SerializeXml<T>(this T obj, DataContractSerializer serializer)
    {
        serializer = serializer ?? new DataContractSerializer(obj == null ? typeof(T) : obj.GetType());
        using (var textWriter = new StringWriter())
        {
            var settings = new XmlWriterSettings { Indent = true };
            using (var xmlWriter = XmlWriter.Create(textWriter, settings))
            {
                serializer.WriteObject(xmlWriter, obj);
            }
            return textWriter.ToString();
        }
    }

    public static T DeserializeXml<T>(this string xml, DataContractResolver resolver)
    {
        return xml.DeserializeXml<T>(new DataContractSerializer(typeof(T), null, int.MaxValue, false, false, null, resolver));
    }

    public static T DeserializeXml<T>(this string xml, Type[] extraTypes)
    {
        return xml.DeserializeXml<T>(new DataContractSerializer(typeof(T), extraTypes));
    }

    public static T DeserializeXml<T>(this string xml, DataContractSerializer serializer)
    {
        using (var textReader = new StringReader(xml ?? ""))
        using (var xmlReader = XmlReader.Create(textReader))
        {
            return (T)(serializer ?? new DataContractSerializer(typeof(T))).ReadObject(xmlReader);
        }
    }
}

public static class ConsoleAndDebug
{
    public static void WriteLine(object s)
    {
        Console.WriteLine(s);
        Debug.WriteLine(s);
    }
}

public class AssertionFailedException : System.Exception
{
    public AssertionFailedException() : base() { }

    public AssertionFailedException(string s) : base(s) { }
}

public static class Assert
{
    public static void IsTrue(bool value)
    {
        if (value == false)
            throw new AssertionFailedException("failed");
    }
}

class GenericDataContractResolver : DataContractResolver
{
    private static readonly Random random_ = new Random();
    private static readonly Dictionary<Tuple<string, string>, Type> toType_ = new Dictionary<Tuple<string, string>, Type>();
    private static readonly Dictionary<Type, Tuple<string, string>> fromType_ = new Dictionary<Type, Tuple<string, string>>();

    private Type CreateDummyType(string typeName, string typeNamespace)
    {
        var className = $"DummyClass_{random_.Next()}";
        var code = $"[System.Runtime.Serialization.DataContract(Name=\"{typeName}\", Namespace=\"{typeNamespace}\")] public class {className} : ModuleData {{}}";

        using (var provider = new CSharpCodeProvider())
        {
            var parameters = new CompilerParameters();
            parameters.ReferencedAssemblies.Add("System.Runtime.Serialization.dll");
            parameters.ReferencedAssemblies.Add(GetType().Assembly.Location); // this assembly (for ModuleData)

            var results = provider.CompileAssemblyFromSource(parameters, code);
            return results.CompiledAssembly.GetType(className);
        }
    }

    // Used at deserialization; allows users to map xsi:type name to any Type 
    public override Type ResolveName(string typeName, string typeNamespace, Type declaredType, DataContractResolver knownTypeResolver)
    {
        var type = knownTypeResolver.ResolveName(typeName, typeNamespace, declaredType, null);

        // resolve all unknown extension datasets; all other should be explicitly known.
        if (type == null && declaredType == typeof(ModuleData) && typeNamespace == Namespaces.ExtensionNamespace)
        {
            // if we already have this type cached, then return the cached one
            var typeNameAndNamespace = new Tuple<string, string>(typeName, typeNamespace);
            if (toType_.TryGetValue(typeNameAndNamespace, out type))
                return type;

            // else compile the dummy type and remember it in the cache
            type = CreateDummyType(typeName, typeNamespace);
            toType_.Add(typeNameAndNamespace, type);
            fromType_.Add(type, typeNameAndNamespace);
        }

        return type;
    }

    // Used at serialization; maps any Type to a new xsi:type representation
    public override bool TryResolveType(Type type, Type declaredType, DataContractResolver knownTypeResolver, out XmlDictionaryString typeName, out XmlDictionaryString typeNamespace)
    {
        if (knownTypeResolver.TryResolveType(type, declaredType, null, out typeName, out typeNamespace))
            return true; // known type

        // is the type one of our cached dummies?
        var typeNameAndNamespace = default(Tuple<string, string>);
        if (declaredType == typeof(ModuleData) && fromType_.TryGetValue(type, out typeNameAndNamespace))
        {
            typeName = new XmlDictionaryString(XmlDictionary.Empty, typeNameAndNamespace.Item1, 0);
            typeNamespace = new XmlDictionaryString(XmlDictionary.Empty, typeNameAndNamespace.Item2, 0);
            return true; // dummy type
        }

        return false; // unknown type
    }
}