0

I'm having to recreate a vendor's XML file. I don't have access to their code, schema, or anything, so I'm doing this using the XmlSerializer and attributes. I'm doing it this way because the system is using a generic XmlWriter I've built to write other system XML files, so I'm killing two birds with one stone. Everything has been working out great, with exception of one property value. The vendor XML looks like this:

<TextOutlTxt>
    <p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
       <span>SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</span>
    </p>
</TextOutlTxt>

Here's my property configuration:

    private string _value;

    [XmlElement("TextOutlTxt")]
    public XmlNode Value
    {
        get
        {
            string text = _value;
            text = Regex.Replace(text, @"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
            string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";

            XmlDocument document = new XmlDocument();
            document.InnerXml = "<root>" + value + "</root>";

            XmlNode innerNode = document.DocumentElement.FirstChild;
            innerNode.InnerText = text;

            return innerNode;
        }
        set
        { }
    }

And this gives me:

<TextOutlTxt>
  <p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &amp;#x22;C&amp;#x22;</p>
</TextOutlTxt>

So I'm close, but no cigar. There is an unwanted xmlns="..." attribute; it must not be present. In my XmlWriter, I have done the following to remove the namespace unless found atop the object it is serializing:

 protected override void OnWrite<T>(T sourceData, Stream outputStream)
    {
        IKnownTypesLocator knownTypesLocator = KnownTypesLocator.Instance;

        //Let's see if we can get the default namespace
        XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();

        XmlSerializer serializer = null;

        if (xmlRootAttribute != null)
        {
            string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
            XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
            nameSpaces.Add(string.Empty, nameSpace);
            serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypesLocator.XmlItems.ToArray(), xmlRootAttribute, nameSpace);

            //Now we can serialize
            using (StreamWriter writer = new StreamWriter(outputStream))
            {
                serializer.Serialize(writer, sourceData, nameSpaces);
            }
        }
        else
        {
            serializer = new XmlSerializer(typeof(T), knownTypesLocator.XmlItems.ToArray());

            //Now we can serialize
            using (StreamWriter writer = new StreamWriter(outputStream))
            {
                serializer.Serialize(writer, sourceData);
            }
        }
    }

I'm sure I'm overlooking something. Any help would be greatly appreciated!

UPDATE 9/26/2017 So... I've been asked to provide more detail, specifically an explanation of the purpose of my code, and a reproducible example. So here's both:

  1. The purpose for the XML. I am writing an interface UI between two systems. I read data from one, give users options to massage the data, and then give the the ability to export the data into files the second system can import. It's regarding a bill of material system where system one are the CAD drawings and objects in those drawings and system two is an enterprise estimating system that is also being configured to support electronic bills of material. I was given the XMLs from the vendor to recreate.
  2. Fully functional example code.... I've tried generalizing the code in a reproducible form.

    [XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
    public class OutlineText
    {
        private string _value;
    
        [XmlElement("TextOutlTxt")]
        public XmlNode Value
        {
            get
            {
                string text = _value;
                text = Regex.Replace(text, @"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
                string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";
    
                XmlDocument document = new XmlDocument();
                document.InnerXml = "<root>" + value + "</root>";
    
                XmlNode innerNode = document.DocumentElement.FirstChild;
                innerNode.InnerText = text;
    
                return innerNode;
             }
            set
            { }
        }
    
        private OutlineText()
        { }
    
        public OutlineText(string text)
        {
            _value = text;
        }
    
    }
    
     public class XmlFileWriter
    {
        public void Write<T>(T sourceData, FileInfo targetFile) where T : class
        {
            //This is actually retrieved through a locator object, but surely no one will mind an empty
            //collection for the sake of an example
            Type[] knownTypes = new Type[] { };
    
            using (FileStream targetStream = targetFile.OpenWrite())
            {
                 //Let's see if we can get the default namespace
                 XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();
    
                 XmlSerializer serializer = null;
    
                if (xmlRootAttribute != null)
                {
                     string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
                     XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
                     nameSpaces.Add(string.Empty, nameSpace);
                     serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypes, xmlRootAttribute, nameSpace);
    
                     //Now we can serialize
                    using (StreamWriter writer = new StreamWriter(targetStream))
                    {
                         serializer.Serialize(writer, sourceData, nameSpaces);
                     }
                }
                else
                {
                    serializer = new XmlSerializer(typeof(T), knownTypes);
    
                    //Now we can serialize
                    using (StreamWriter writer = new StreamWriter(targetStream))
                    {
                        serializer.Serialize(writer, sourceData);
                    }
                }
            }
        }
    }
    
    
     public static void Main()
    {
        OutlineText outlineText = new OutlineText(@"SUBSTA SF6 CIRCUIT BKR CONC FDN ""C""");
    
        XmlFileWriter fileWriter = new XmlFileWriter();
        fileWriter.Write<OutlineText>(outlineText, new FileInfo(@"C:\MyDirectory\MyXml.xml"));
    
    
        Console.ReadLine();
    }
    

The result produced:

<?xml version="1.0" encoding="utf-8"?>
<OutlTxt xmlns="http://www.mynamespace/09262017">
  <TextOutlTxt>
    <p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &amp;#x22;C&amp;#x22;</p>
  </TextOutlTxt>
</OutlTxt>

Edit 9/27/2017 Per the request in the solution below, a secondary issue I've ran into is keeping the hexadecimal codes. To illustrate this issue based on the above example, let's say the value between is

SUBSTA SF6 CIRCUIT BKR CONC FDN "C"

The vendor file is expecting the literals to be in their hex code format like so

SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;

I've rearranged the sample code Value property to be like so:

        private string _value;

    [XmlAnyElement("TextOutlTxt", Namespace = "http://www.mynamespace/09262017")]
    public XElement Value
    {
        get
        {
            string value = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", "http://www.mynamespace/09262017", _value);


            string innerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", "http://www.mynamespace/09262017", value);

            XElement element = XElement.Parse(innerXml);

            //Remove redundant xmlns attributes
            foreach (XElement descendant in element.DescendantsAndSelf())
            {
                descendant.Attributes().Where(att => att.IsNamespaceDeclaration && att.Value == "http://www.mynamespace/09262017").Remove();
            }

            return element;
        }
        set
        {
            _value = value == null ? null : value.ToString();
        }
    }

if I use the code

 string text = Regex.Replace(element.Value, @"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));

to create the hex code values ahead of the XElement.Parse(), the XElement converts them back to their literal values. If I try to set the XElement.Value directly after the XElement.Parse()(or through SetValue()), it changes the " to &#x22; Not only that, but it seems to mess with the element output and adds additional elements throwing it all out of whack.

Edit 9/27/2017 #2 to clarify, the original implementation had a related problem, namely that the escaped text was re-escaped. I.e. I was getting

SUBSTA SF6 CIRCUIT BKR CONC FDN &amp;#x22;C&amp;#x22;

But wanted

SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;
dbc
  • 104,963
  • 20
  • 228
  • 340
bjhuffine
  • 924
  • 1
  • 11
  • 23
  • It's possible that you are completely over-engineering the problem, when I tried your code, I got out of the other end `"

    SUBSTA SF6 CIRCUIT BKR CONC FDN &#x26;amp;#x22;C&#x26;amp;#x22;

    "` by accessing the OuterXml property. When you say re-create the vendors XML file do you mean so that you can make requests? or so that you can intercept responses? Not clear enough.
    – Kodaloid Sep 26 '17 at 00:36
  • Your question is unclear because you don't show how you initialize `_value` or what your container type looks like. But is this what you need? https://dotnetfiddle.net/VtdFky If you provided a [mcve] I would be able to answer definitively. – dbc Sep 26 '17 at 01:15
  • Your code is html not xml. The file was encoded properly for html. See wiki : https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references. You can use System.Net.WebUtility.HtmlDecode() and System.Net.WebUtility.HtmlEncode() – jdweng Sep 26 '17 at 06:28
  • @jdweng The document in the question is valid XML. Further, I've no idea what you think `HtmlDecode` and `HtmlEncode` do, but I'm prepared to bet that you don't know what they do, or that you don't understand this question at all. – David Heffernan Sep 26 '17 at 07:49
  • David : You are wrong again. The

    tag is usually html.

    – jdweng Sep 26 '17 at 10:27
  • @Kodaloid Yeah, I guess I wasn't providing a lot of detail because I didn't want to overwhelm the issue. I'm writing an interface UI between two systems. I have to generate the xml that the vendor needs for the second system based on data I've read from a first system. – bjhuffine Sep 26 '17 at 11:35
  • 1
    @jdweng sorry, but thank you anyways. The HtmlEncode would be great for encoding string to be compatible with html text, but that's not the issue here. I've got that to work out in the format the vendor is expecting. Now I just want to remove the namespace element created as a result. – bjhuffine Sep 26 '17 at 11:43
  • @dbc _value is just a string value instantiated from the constructor. I've updated to include a more detailed description of the product, purpose of the xml, and reproducible code. Hopefully this meets the standard of whoever has decided to bump my question's score down and it can be bumped back up. I'm now going to look at what you provided on dotnetfiddle and see how it applies. – bjhuffine Sep 26 '17 at 13:10
  • You know... I just realized that my hex representation of the literals was changed too... good gravy... the " became &#x22; – bjhuffine Sep 26 '17 at 13:21
  • @jdweng Nope, this is valid XML. It might also be HTML. But the two are not mutually exclusive. You surely have heard of XHTML. Put the content into an XML validator if you want to check whether or not it is valid XML. – David Heffernan Sep 26 '17 at 14:37
  • @dbc Your suggestion seems to return the element with an empty xmlns attribute as well. – bjhuffine Sep 26 '17 at 18:00

2 Answers2

1

The reason you are getting xmlns="" added to your embedded XML is that your container element(s) <OutlineText> and <TextOutlTxt> are declared to be in the "http://www.mynamespace/09262017" namespace by use of the [XmlRootAttribute.Namespace] attribute, whereas the embedded literal XML elements are in the empty namespace. To fix this, your embedded XML literal must be in the same namespace as its parent elements.

Here is the XML literal. Notice there is no xmlns="..." declaration anywhere in the XML:

<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &amp;#x22;C&amp;#x22;</p>

Lacking such a declaration, the <p> element is in the empty namespace. Conversely, your OutlineText type is decorated with an [XmlRoot] attribute:

[XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
public class OutlineText
{
}

Thus the corresponding OutlTxt root element will be in the http://www.mynamespace/09262017 namespace. All its child elements will default to this namespace as well unless overridden. Placing the embedded XmlNode in the empty namespace counts as overriding the parent namespace, and so an xmlns="" attribute is required.

The simplest way to avoid this problem is for your embedded XML string literal to place itself in the correct namespace as follows:

<p xmlns="http://www.mynamespace/09262017" style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>ReplaceMe</span>
</p>

Then, in your Value method, strip redundant namespace declarations. This is somewhat easier to do with the LINQ to XML API:

[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText
{
    public const string Namespace = "http://www.mynamespace/09262017";

    private string _value;

    [XmlAnyElement("TextOutlTxt", Namespace = OutlineText.Namespace)]
    public XElement Value
    {
        get
        {
            var escapedValue = EscapeTextValue(_value);

            var nestedXml = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", Namespace, escapedValue);
            var outerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", Namespace, nestedXml);

            var element = XElement.Parse(outerXml);

            //Remove redundant xmlns attributes
            element.DescendantsAndSelf().SelectMany(e => e.Attributes()).Where(a => a.IsNamespaceDeclaration && a.Value == Namespace).Remove();

            return element;
        }
        set
        {
            _value = value == null ? null : value.Value;
        }
    }

    static string EscapeTextValue(string text)
    {
        return Regex.Replace(text, @"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
    }

    private OutlineText()
    { }

    public OutlineText(string text)
    {
        _value = text;
    }
}

And the resulting XML will look like:

<OutlTxt xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.mynamespace/09262017">
  <TextOutlTxt>
    <p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
      <span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span>
    </p>
  </TextOutlTxt>
</OutlTxt>

Note that I have changed the attribute on Value from [XmlElement] to [XmlAnyElement]. I did this because it appears your value XML might contain multiple mixed content nodes at the root level, e.g.:

Start Text <p>Middle Text</p> End Text

Using [XmlAnyElement] enables this by allowing a container node to be returned without causing an extra level of XML element nesting.

Sample working .Net fiddle.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • You're fantastic friend. I've been in a meeting most of the day, but have been able to research more on your suggestion and have implemented it in both the sample code and actual project. It works like a charm. Your approach makes very good sense. If you don't mind me asking, when I change the string value to use the hexadecimal character codes for the string literals? I've rearranged the code to set the XElement.Value to the replaced value because the XElement.Parse() converts them back to their literal values. Setting the Value property directly results in '&'='&' instead. – bjhuffine Sep 27 '17 at 18:34
  • If it makes sense, I can post another question separately from this. – bjhuffine Sep 27 '17 at 18:35
  • @bjhuffine - Not sure. Can you edit your question to demonstrate the problem with escaping? Basically I'd need to see a specific value for `_value` (i.e. a call to `new OutlineText(...)`) that reproduces the problem along with the exact XML you want to generate. – dbc Sep 27 '17 at 18:40
  • 1
    I have edited it accordingly. A specific example would be if _value contained an escaped character like the double quote. So with the XElement.Parse() the " becomes the double-quote value. If I try to set the element.Value property value directly, the " becomes &#x22; Not only that, but it seems to mess around with the element structure as well. – bjhuffine Sep 27 '17 at 19:15
  • @bjhuffine - I just reran your original code. Just to confirm, for the text value it outputs **`SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;`** but what you want is **`SUBSTA SF6 CIRCUIT BKR CONC FDN "C"`**. Is that correct? – dbc Sep 27 '17 at 20:47
1

Your question now has two requirements:

  1. Suppress certain xmlns="..." attributes on an embedded XElement or XmlNode while serializing, AND

  2. Force certain characters inside element text to be escaped (e.g. " => &#x22;). Even though this is not required by the XML standard, your legacy receiving system apparently needs this.

Issue #1 can be addressed as in this answer

For issue #2, however, there is no way to force certain characters to be unnecessarily escaped using XmlNode or XElement because escaping is handled at the level of XmlWriter during output. And Microsoft's built-in implementations of XmlWriter seem not to have any settings that can force certain characters that do not need to be escaped to nevertheless be escaped. You would need to try to subclass XmlWriter or XmlTextWriter (as described e.g. here and here) then intercept string values as they are written and escape quote characters as desired.

Thus, as an alternate approach that solves both #1 and #2, you could implement IXmlSerializable and write your desired XML directly with XmlWriter.WriteRaw():

[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText : IXmlSerializable
{
    public const string Namespace = "http://www.mynamespace/09262017";

    private string _value;

    // For debugging purposes.
    internal string InnerValue { get { return _value; } }

    static string EscapeTextValue(string text)
    {
        return Regex.Replace(text, @"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
    }

    private OutlineText()
    { }

    public OutlineText(string text)
    {
        _value = text;
    }

    #region IXmlSerializable Members

    XmlSchema IXmlSerializable.GetSchema()
    {
        return null;
    }

    void IXmlSerializable.ReadXml(XmlReader reader)
    {
        _value = ((XElement)XNode.ReadFrom(reader)).Value;
    }

    void IXmlSerializable.WriteXml(XmlWriter writer)
    {
        var escapedValue = EscapeTextValue(_value);
        var nestedXml = string.Format("<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{0}</span></p>", escapedValue);
        writer.WriteRaw(nestedXml);
    }

    #endregion
}

And the output will be

<OutlTxt xmlns="http://www.mynamespace/09262017"><p style="text-align:left;margin-top:0pt;margin-bottom:0pt;"><span>SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</span></p></OutlTxt>

Note that, if you use WriteRaw(), you can easily generate invalid XML simply by writing markup characters embedded in text values. You should be sure to add unit tests that verify that does not occur, e.g. that new OutlineText(@"<") does not cause problems. (A quick check seems to show your Regex is escaping < and > appropriately.)

New sample .Net fiddle.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Thanks! I didn't even realize I could intercept the write process like this. Your assistance has been more than helpful. I need to add you to my Christmas list! Unfortunately, I can only mark one as an answer, but I did up vote this one. Tomorrow I'll work on some creative refactoring to minimize it's impact. – bjhuffine Sep 27 '17 at 23:21