104

Is there any C# function which could be used to escape and un-escape a string, which could be used to fill in the content of an XML element?

I am using VSTS 2008 + C# + .Net 3.0.

EDIT 1: I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand, for example, I need to put a<b into <foo></foo>, so I need escape string a<b and put it into element foo.

jessehouwing
  • 106,458
  • 22
  • 256
  • 341
George2
  • 44,761
  • 110
  • 317
  • 455
  • 17
    Shortest I can think of: `new XText(unescaped).ToString()` – sehe Dec 10 '12 at 17:31
  • 3
    For anyone else stumbling upon this, I've found this to be the best answer: http://stackoverflow.com/a/5304827/1224069 – Philip Pittle May 16 '15 at 01:31
  • Not a single way, but here are a few: [http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx](http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx) – marcc Jul 15 '09 at 16:33
  • @sehe No, that does not escape a string. It merely serializes a text node that still contains the same characters. – Suncat2000 Mar 17 '22 at 12:21
  • @Suncat2000 I'll like to the more insightful comments at the corresponding answer instead https://stackoverflow.com/a/19498780/85371 – sehe Mar 17 '22 at 12:49

11 Answers11

142

SecurityElement.Escape(string s)

TWA
  • 12,756
  • 13
  • 56
  • 92
83
public static string XmlEscape(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    XmlNode node = doc.CreateElement("root");
    node.InnerText = unescaped;
    return node.InnerXml;
}

public static string XmlUnescape(string escaped)
{
    XmlDocument doc = new XmlDocument();
    XmlNode node = doc.CreateElement("root");
    node.InnerXml = escaped;
    return node.InnerText;
}
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • 5
    You don't even need to append the element to the document. However, I'd still say that it's best not to try to do this in the first place - it sounds like George is making work for himself by doing things by hand... – Jon Skeet Jul 15 '09 at 17:01
  • Completely agree with you Jon. I didn't know that it wasn't necessary to append the node to make it work. That's why I love StackOverflow - I learn so many things every day. – Darin Dimitrov Jul 15 '09 at 17:05
  • 16
    I really dislike this answer because it's too heavy-weight. XmlDocument is going to use XmlReader/XmlWriter to do the real work, so why not cut to the chase and avoid that heavy DOM? – Steven Sudit Jul 15 '09 at 19:49
  • This answer also doesn't escape quotes. Fail. –  Mar 15 '10 at 15:59
  • @Will, simple quotes don't need to be escaped like double quotes. What this sample guarantees is that it will generate valid XML no matter what you put in the string. – Darin Dimitrov Mar 15 '10 at 17:32
  • @darin I usually say "single quote" and "quote", I guess you say "quote" and "double quote." I did mean, as you say, "double quotes" as your escape method does not escape "double quotes", so your function does not correctly escape text for use in an XML document. The use of the SecurityElement to escape text and your unescape method works well enough, however. So you're half right. –  Mar 16 '10 at 12:18
  • 7
    @Will, the OP asked for a function that will escape a text which could be put in a XML **element** and not attribute. My function doesn't escape single or double quotes because they can be put in XML elements. – Darin Dimitrov Mar 16 '10 at 12:24
  • 5
    @darin good point, and one that should be stressed. I am satisfied with the result of this conversation, and withdraw my reservations. Good day, sir. –  Mar 16 '10 at 13:15
  • 1
    I wonder if `HttpUtility.HtmlEncode` from `System.Web` could safely be used? – Pooven May 01 '13 at 18:41
  • 1
    @Pooven I would avoid it, especially for ampersand issues and occasional unicode issues. HtmlEncoding sometimes does ä -> ä or © -> © coversions, which break the XML. The exact implementation and result of HtmlEncoding depends on the method used, there are many different HtmlEncoding ways in .net, all of them different and I don't think any of them guaranteed to give XML-compatible results. – Joel Peltonen Jun 11 '13 at 07:06
  • 1
    An explicit note about the difference between the the two functions seems warranted, since they look almost identical. – jpmc26 Mar 18 '15 at 03:36
  • This works on Windows Mobile (where System.Security is not available). – Andrea Colleoni Jun 09 '15 at 13:02
  • 1
    @William `;` and `=` are not characters that need to be escaped in XML. – JLRishe Oct 30 '16 at 18:02
  • This does not escape double quotes – Luca Ziegler Mar 05 '21 at 12:12
44

EDIT: You say "I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand".

I would strongly advise you not to do it by hand. Use the XML APIs to do it all for you - read in the original files, merge the two into a single document however you need to (you probably want to use XmlDocument.ImportNode), and then write it out again. You don't want to write your own XML parsers/formatters. Serialization is somewhat irrelevant here.

If you can give us a short but complete example of exactly what you're trying to do, we can probably help you to avoid having to worry about escaping in the first place.


Original answer

It's not entirely clear what you mean, but normally XML APIs do this for you. You set the text in a node, and it will automatically escape anything it needs to. For example:

LINQ to XML example:

using System;
using System.Xml.Linq;

class Test
{
    static void Main()
    {
        XElement element = new XElement("tag",
                                        "Brackets & stuff <>");

        Console.WriteLine(element);
    }
}

DOM example:

using System;
using System.Xml;

class Test
{
    static void Main()
    {
        XmlDocument doc = new XmlDocument();
        XmlElement element = doc.CreateElement("tag");
        element.InnerText = "Brackets & stuff <>";
        Console.WriteLine(element.OuterXml);
    }
}

Output from both examples:

<tag>Brackets &amp; stuff &lt;&gt;</tag>

That's assuming you want XML escaping, of course. If you're not, please post more details.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks Jon, I have put more details into my original post EDIT 1 section. Appreciate if you could give me some comments and advice. :-) – George2 Jul 15 '09 at 16:40
  • "after XML escaping" -- you mean? Could you speak in some other words please? English is not my native language. :-) – George2 Jul 15 '09 at 16:41
  • Hi Jon, how to un-escape from XML format into normal string format, i.e. from input "Brackets & stuff <>", we get output "Brackets & stuff <>"? – George2 Jul 15 '09 at 16:50
  • 3
    @George2: You ask the XElement for its Value, or the XmlElement for its InnerText. – Jon Skeet Jul 15 '09 at 16:56
28

Thanks to @sehe for the one-line escape:

var escaped = new System.Xml.Linq.XText(unescaped).ToString();

I add to it the one-line un-escape:

var unescapedAgain = System.Xml.XmlReader.Create(new StringReader("<r>" + escaped + "</r>")).ReadElementString();
Keith Robertson
  • 791
  • 7
  • 13
11

George, it's simple. Always use the XML APIs to handle XML. They do all the escaping and unescaping for you.

Never create XML by appending strings.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • Words to live by. There are many XML API options available, but the one thing we should all agree on is that manual string concatenation is not acceptable. – Steven Sudit Jul 15 '09 at 19:51
  • While I generally agree with this, there may be some very rare cases where manual escaping may be necessary. For example, while creating XML documentation using Roslyn. – svick May 01 '12 at 15:56
  • @svick: why not create the XML using LINQ to XML, and then use .ToString()? – John Saunders May 01 '12 at 16:37
  • @JohnSaunders, because Roslyn has its own set of XML classes, like `XmlElementSyntax`. And it's also complicated by the fact that you need to generate the `///` too. And I can't generate each line as a separate `XObject`, because that wouldn't work for multiline tags. – svick May 01 '12 at 17:07
  • 1
    @svick: so generate the xml, all on one line, stick `///` in front of it, then reformat the code. Not a huge big deal, and certainly very much a corner case. If absolutely necessary, I'm sure you could create a custom `XmlWriter` to do line breaks and whitespace the way you'd like, but placing `///` in front of new lines. Alternatively, use an XSLT to pretty-print the XML. But in any case, XML should still be generated by an XML API. – John Saunders May 01 '12 at 19:41
10

And if you want, like me when I found this question, to escape XML node names, like for example when reading from an XML serialization, use the easiest way:

XmlConvert.EncodeName(string nameToEscape)

It will also escape spaces and any non-valid characters for XML elements.

http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28VS.80%29.aspx

CharlieBrown
  • 4,143
  • 23
  • 24
  • I think, based on the questions, that they just want inner text. Your solution will work, but is somewhat overkill as it's intended to also handle things like element and attribute names.\ – Sean Duggan Mar 19 '14 at 16:20
  • 1
    Well I got here trying to escape node names anything and thought my findings could help anybody in the future. I also don't see what's the "overkill" but it's OK. ;) – CharlieBrown Mar 20 '14 at 10:30
  • Oh, it's useful information. :) I just figured I'd point out that one of the reasons you might not have gotten upvoted was because people might feel that you're not answering the question at hand. – Sean Duggan Mar 20 '14 at 11:35
  • The link leads to docs for SecurityElement.Escape(String), was this intentional? XmlConvert.EncodeName(String) has it's own page. I know it has been a few years since this was asked, but how do I know which one to use? Don't they do the same thing but in different ways? – micnil Sep 17 '18 at 12:31
  • @CharlieBrown: Maybe you also want to create a separate question out of it and answer it, so people can better find it. Thanks for posting it! – Florian Straub Oct 21 '20 at 05:58
7

Another take based on John Skeet's answer that doesn't return the tags:

void Main()
{
    XmlString("Brackets & stuff <> and \"quotes\"").Dump();
}

public string XmlString(string text)
{
    return new XElement("t", text).LastNode.ToString();
} 

This returns just the value passed in, in XML encoded format:

Brackets &amp; stuff &lt;&gt; and "quotes"
Rick Strahl
  • 17,302
  • 14
  • 89
  • 134
5

WARNING: Necromancing

Still Darin Dimitrov's answer + System.Security.SecurityElement.Escape(string s) isn't complete.

In XML 1.1, the simplest and safest way is to just encode EVERYTHING.
Like &#09; for \t.
It isn't supported at all in XML 1.0.
For XML 1.0, one possible workaround is to base-64 encode the text containing the character(s).

//string EncodedXml = SpecialXmlEscape("привет мир");
//Console.WriteLine(EncodedXml);
//string DecodedXml = XmlUnescape(EncodedXml);
//Console.WriteLine(DecodedXml);
public static string SpecialXmlEscape(string input)
{
    //string content = System.Xml.XmlConvert.EncodeName("\t");
    //string content = System.Security.SecurityElement.Escape("\t");
    //string strDelimiter = System.Web.HttpUtility.HtmlEncode("\t"); // XmlEscape("\t"); //XmlDecode("&#09;");
    //strDelimiter = XmlUnescape("&#59;");
    //Console.WriteLine(strDelimiter);
    //Console.WriteLine(string.Format("&#{0};", (int)';'));
    //Console.WriteLine(System.Text.Encoding.ASCII.HeaderName);
    //Console.WriteLine(System.Text.Encoding.UTF8.HeaderName);


    string strXmlText = "";

    if (string.IsNullOrEmpty(input))
        return input;


    System.Text.StringBuilder sb = new StringBuilder();

    for (int i = 0; i < input.Length; ++i)
    {
        sb.AppendFormat("&#{0};", (int)input[i]);
    }

    strXmlText = sb.ToString();
    sb.Clear();
    sb = null;

    return strXmlText;
} // End Function SpecialXmlEscape

XML 1.0:

public static string Base64Encode(string plainText)
{
    var plainTextBytes = System.Text.Encoding.UTF8.GetBytes(plainText);
    return System.Convert.ToBase64String(plainTextBytes);
}

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
3

Following functions will do the work. Didn't test against XmlDocument, but I guess this is much faster.

public static string XmlEncode(string value)
{
    System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings 
    {
        ConformanceLevel = System.Xml.ConformanceLevel.Fragment
    };

    StringBuilder builder = new StringBuilder();

    using (var writer = System.Xml.XmlWriter.Create(builder, settings))
    {
        writer.WriteString(value);
    }

    return builder.ToString();
}

public static string XmlDecode(string xmlEncodedValue)
{
    System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings
    {
        ConformanceLevel = System.Xml.ConformanceLevel.Fragment
    };

    using (var stringReader = new System.IO.StringReader(xmlEncodedValue))
    {
        using (var xmlReader = System.Xml.XmlReader.Create(stringReader, settings))
        {
            xmlReader.Read();
            return xmlReader.Value;
        }
    }
}
2

Using a third-party library (Newtonsoft.Json) as alternative:

public static string XmlEscape(string unescaped)
{
    if (unescaped == null) return null;
    return JsonConvert.SerializeObject(unescaped); ;
}

public static string XmlUnescape(string escaped)
{
    if (escaped == null) return null;
    return JsonConvert.DeserializeObject(escaped, typeof(string)).ToString();
}

Examples of escaped string:

a<b ==> "a&lt;b"

<foo></foo> ==> "foo&gt;&lt;/foo&gt;"

NOTE: In newer versions, the code written above may not work with escaping, so you need to specify how the strings will be escaped:

public static string XmlEscape(string unescaped)
{
    if (unescaped == null) return null;
    return JsonConvert.SerializeObject(unescaped, new JsonSerializerSettings()
    {
        StringEscapeHandling = StringEscapeHandling.EscapeHtml
    });
}

Examples of escaped string:

a<b ==> "a\u003cb"

<foo></foo> ==> "\u003cfoo\u003e\u003c/foo\u003e"

abberdeen
  • 323
  • 7
  • 32
  • This generates JSON not XML? – Roland Pihlakas Dec 10 '20 at 04:59
  • This generates just escaped string. In fact, these functions can be used to "escape" and "unescape" the input string. Input string for escaping might be for ex. HTML or XML. I've changed function name to make it more correct. – abberdeen Dec 10 '20 at 20:03
  • But XML should not have quotes around the string, which this function produces. Also, not all characters are escaped in XML compatible form. For example tab gets formatted as "\t". – Roland Pihlakas Dec 11 '20 at 05:39
  • Also, can you please point to a XML parsing function which is able to read the characters in the form of \uxxxx ? – Roland Pihlakas Dec 11 '20 at 05:43
  • This is working well for JSON, not XML. XML has more special characters like `&` that will be missed – nrofis Aug 11 '21 at 12:56
1

SecurityElementEscape does this job for you

Use this method to replace invalid characters in a string before using the string in a SecurityElement. If invalid characters are used in a SecurityElement without being escaped, an ArgumentException is thrown.

The following table shows the invalid XML characters and their escaped equivalents.

enter image description here

https://learn.microsoft.com/en-us/dotnet/api/system.security.securityelement.escape?view=net-5.0

AllmanTool
  • 1,384
  • 1
  • 16
  • 26
  • A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](/help/deleted-answers) – STA Apr 15 '21 at 09:01