1

I have created the following wrapper method to disable DTD

public class Program
{
    public static void Main(string[] args)
    {
        string s = @"<?xml version =""1.0"" encoding=""utf-16""?>
<ArrayOfSerializingTemplateItem xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance""> 
    <SerializingTemplateItem>
    </SerializingTemplateItem>
</ArrayOfSerializingTemplateItem >";
        try
        {
            XmlReader reader = XmlWrapper.CreateXmlReaderObject(s);
            XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
            Object ob = sr.Deserialize(reader);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
            throw;
        }
        Console.ReadLine();
    }
}

public class XmlWrapper
{
    public static XmlReader CreateXmlReaderObject(string sr)
    {
        byte[] byteArray = Encoding.UTF8.GetBytes(sr);
        MemoryStream stream = new MemoryStream(byteArray);
        stream.Position = 0;
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.None;
        settings.DtdProcessing = DtdProcessing.Ignore;
        return XmlReader.Create(stream, settings);
    }        
}

public class SerializingTemplateItem
{
}

The above throws exception "There is no Unicode byte order mark. Cannot switch to Unicode." (Demo fiddle here: https://dotnetfiddle.net/pGxOE9).

But if I use the following code to create the XmlReader instead of calling the XmlWrapper method. It works fine.

 StringReader stringReader = new StringReader( xml );
 XmlReader reader = new XmlTextReader( stringReader );

But I need to use the wrapper method as a security requirement to disable DTD. I don't know why I am unable to deserialize after calling my wrapper method. Any help will be highly appreciated.

dbc
  • 104,963
  • 20
  • 228
  • 340
James
  • 15
  • 5
  • I think the StringReader is defaulting to Ansii encoding and removing the unicode characters. Check to make sure the data didn't change. – jdweng Sep 18 '18 at 21:34
  • Can't reproduce, see e.g. https://dotnetfiddle.net/f2Xfpe. We will need a [mcve] to help you further, including the XML string and a simplified version of `Type`. – dbc Sep 18 '18 at 22:12
  • Incidentally, you can deserialize directly from a `string` using a `StringReader` and `XmlReaderSettings` by passing the `StringReader` to `XmlReader.Create()`. See e.g. [here](https://stackoverflow.com/q/39494716) or [here](https://stackoverflow.com/a/44248192). Don't know if doing it this way will fix your problem but it might. It will also be more efficient. – dbc Sep 18 '18 at 23:32
  • Repo here: https://dotnetfiddle.net/pGxOE9 – dbc Sep 19 '18 at 20:53
  • @James - the c# string literal for the XML had some unrelated problems and the correct deserialization type was `List` not `SerializingTemplateItem`, so I went ahead and fixed those in the question. – dbc Sep 19 '18 at 21:36

1 Answers1

0

Your problem is that you have encoded the XML into a MemoryStream using Encoding.UTF8, but the XML string itself claims to be encoded in UTF-16 in the encoding declaration in its XML text declaration:

<?xml version ="1.0" encoding="utf-16"?>
<ArrayOfSerializingTemplateItem> 
    <!-- Content omitted -->
</ArrayOfSerializingTemplateItem >

Apparently when the XmlReader encounters this declaration, it tries honor the declaration and switch from UTF-8 to UTF-16 but fails for some reason - possibly because the stream really is encoded in UTF-8. Conversely when the deprecated XmlTextReader encounters the declaration, it apparently just ignores it as not implemented, which happens to cause things to work successfully in this situation.

The simplest way to resolve this is to read directly from the string using a StringReader using XmlReader.Create(TextReader, XmlReaderSettings):

public class XmlWrapper
{
    public static XmlReader CreateXmlReaderObject(string sr)
    {
        var settings = new XmlReaderSettings
        {
            ValidationType = ValidationType.None,
            DtdProcessing = DtdProcessing.Ignore,
        };
        return XmlReader.Create(new StringReader(sr), settings);
    }        
}

Since a c# string is always encoded internally in UTF-16 the encoding statement in the XML will be ignored as irrelevant. This will also be more performant as the conversion to an intermediate byte array is completely skipped.

Incidentally, you should dispose of your XmlReader via a using statement:

Object ob;
using (var reader = XmlWrapper.CreateXmlReaderObject(s))
{
    XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
    ob = sr.Deserialize(reader);
}

Working sample fiddle here.

Related questions:

dbc
  • 104,963
  • 20
  • 228
  • 340