C# how to deserialize an xml tag embedded in text?

Question

I am trying to deserialize the output of .NET's XML doc comment using an XmlSerializer. For reference, the output of xml documentation looks like:

<?xml version="1.0"?>
<doc>
    <assembly>
        <name>Apt.Lib.Data.Product</name>
    </assembly>
    <members>
        <member name="P:MyNamespace.MyType.MyProperty">
            <summary>See <see cref="T:MyNamespace.MyOthertype"/> for more info</summary>
        </member>
        ...
    </members>
</doc>

The object I'm using to generate the serializer is:

    [XmlRoot("doc")]
    public class XmlDocumentation
    {
        public static readonly XmlSerializer Serializer = new XmlSerializer(typeof(XmlDocumentation));

        [XmlElement("assembly")]
        public AssemblyName Assembly { get; set; }
        [XmlArray("members")]
        [XmlArrayItem("member")]
        public List<Member> Members { get; set; }

        public class AssemblyName
        {
            [XmlElement("name")]
            public string Name { get; set; }
        }

        public class Member
        {
            [XmlAttribute("name")]
            public string Name { get; set; }
            [XmlElement("summary")]
            public string Summary { get; set; }
        }
}

The problem is when the serializer encounters the embedded see cref tag. In that case the serializer throws the following exception:

System.InvalidOperationException : There is an error in XML document (147, 27). ----> System.Xml.XmlException : Unexpected node type Element. ReadElementString method can only be called on elements with simple or empty content. Line 147, position 27.

How can I capture the entire content of the summary tag as a string during deserialization?

@cdhowie: as I've layed it out, Summary is just a property of type string. It's not a class. — ChaseMedallion, Jun 13 '13 at 15:56
I can read, I swear. The problem seems to be that the serializer is not properly escaping the special XML characters in that string. — cdhowie, Jun 13 '13 at 15:56
@cdhowie: in this case I'm deserializing xml generated by the C# compiler, so I don't have any control over the generated content. I just want to set up my deserializer to work given the content. — ChaseMedallion, Jun 13 '13 at 16:00
I don't think that a class can accurately model this in a way that XmlSerializer will understand. You have text mixed in with elements, which is not a case that XmlSerializer was really built for. — cdhowie, Jun 13 '13 at 18:03
This answer might help you - http://stackoverflow.com/questions/26525403/deserialize-part-of-xml-into-string/26526485#26526485 — Dhanashree, Nov 17 '16 at 11:34
This link might help you - http://stackoverflow.com/questions/26525403/deserialize-part-of-xml-into-string/26526485#26526485 — Dhanashree, Nov 17 '16 at 11:35

score 0 · Answer 1 · edited May 23 '17 at 12:13

0

The cref tag itself contains illegal characters. Specifically <, > can't be embedded in the contents of an XML element. You should sanitize the strings before they are serialized or deserialized.

You can do something like this if you need to be able to apply specific rules to how certain characters are escaped or substituted:

    string ScrubString(string dirty)
    {
        char[] charArray = dirty.ToCharArray();
        StringBuilder strBldr = new StringBuilder(dirty.Length);

        for (int i = 0; i < charArray.Length; i++)
        {
           if(IsXmlSafe(charArray[i]))
           {
              strBldr.Append(charArray[i]);
           }
           else
           {
              //do something to escape or replace that character. 
           }
        }
        retrun strBldr.ToString();
    }


    bool IsXmlSafe(char c)
    {
       int charInt = Convert.ToInt32(c);

       return charInt == 9
           || charInt == 13
           || (charInt >= 32    && charInt <= 9728)
           || (charInt >= 9983  && charInt <= 55295)
           || (charInt >= 57344 && charInt <= 65533)
           || (charInt >= 65536 && charInt <= 1114111);
    }

You can also use some of the approaches here to just remove any illegal character using regex:

Invalid Characters in XML

edited May 23 '17 at 12:13

Community

1
1

answered Jun 13 '13 at 16:39

Matt Ringer

1,358
11
17

So are you saying that there is no way to deserialize the XML file generated by C#'s doc comments using XmlSerializer? – ChaseMedallion Jun 13 '13 at 16:58
I suppose I'm not really clear on how the cref tag is ending up in the Summary in the first place. Is it something you've added to your method summary manually? Have a look at this http://msdn.microsoft.com/en-us/library/z04awywx.aspx looks like you can use cref as an attribute rather than an element. – Matt Ringer Jun 13 '13 at 17:34
@Ringer The cref attribute is valid; it allows you to link some text to other documentation. – cdhowie Jun 13 '13 at 18:27
@cdhowie right, the cref attribute is but it looks like he has an entire cref tag in the body of the summary rather than making it an attribute on the summary tag. Would be easier to tell if we could see the source file this was generated from. – Matt Ringer Jun 13 '13 at 18:33
@Ringer Putting the attribute on the summary tag would be the wrong thing to do, as that would link the entire summary. A `` might do the trick, though. – cdhowie Jun 13 '13 at 18:42
@cdhowie Sure, but the issue seems to be that the whole tag with illegal characters is ending up in the summary contents. Which would cause problems when he later tries to deserialize the document with the unescaped characters included. I don't know if there is another useful way to include the cref without making it part of the summary. – Matt Ringer Jun 13 '13 at 19:01
1

@Ringer The problem is not the unescaped characters -- they should not be escaped -- but rather that this document structure is not one that XmlSerializer was designed to handle. – cdhowie Jun 13 '13 at 19:02
@cdhowie fair enough, but if he wants to use the XmlSerializer he'll have to make it conform to the XmlSerializer doc structure. – Matt Ringer Jun 13 '13 at 19:14
@Ringer That is true. But I would suggest using a different mechanism to read the data rather than making the going out of the way to make the data compatible with one of many tools that can read XML. – cdhowie Jun 13 '13 at 19:26

C# how to deserialize an xml tag embedded in text?

1 Answers1

Linked