How to unescape special characters in c#

Question

I have the following code

XElement element = new XElement("test", "a&b");

where

element.LastNode contains the value "a&b".

i wanted to be it "a&b".

How do i replace this?

i tried to do XElement element = new XElement("test", HttpUtility.HtmlDecode("a&b")); — user2392525, Oct 09 '14 at 09:36
& is not a valid character in XML as it introduces an entity code. & is used for & in XML. — Krisztián Balla, Oct 09 '14 at 09:36
@user2392525 http://stackoverflow.com/questions/1473826/parsing-xml-with-ampersand — Thirisangu Ramanathan, Oct 09 '14 at 09:46

Jodrell · Accepted Answer · 2014-10-09T10:32:28.110

Wait a moment,

<test>a&b</test>

is not valid XML. You cannot make XML that looks like this. This is clarified by the XML standard.

& has special meaning, it denotes an escaped character that may otherwise be invalid. An '&' character is encoded as & in XML.

for what its worth, this is invalid HTML for the same reason.

<!DOCTYPE html> <html> <body> a&b </body> </html>

If I write the code,

const string Value = "a&b";
var element = new XElement("test", Value);
Debug.Assert(
    string.CompareOrdinal(Value, element.Value) == 0,
    "XElement is mad");

it runs without error, XElement encodes and decodes to and from XML as necessary.

To unescape or decode the XML element you simply read XElement.Value.

If you want to make a document that looks like

<test>a&b</test>

you can but it is not XML or HTML, tools for working with HTML or XML won't intentionally help you. You'll have make your own Readers, Writers and Parsers.

score 3 · Answer 2 · answered Oct 09 '14 at 09:50

3

The & is a reserved character so it will allways be encoded. So you have to decode:

Is this an option: HttpUtility.HtmlDecode Method (String)

Usage:

string decoded = HttpUtility.HtmlDecode("a&amp;b");
// returns "a&b"

answered Oct 09 '14 at 09:50

hwcverwe

5,287
7
35
63

this works but i cannot assign this string value to Xelement.Lastnode – user2392525 Oct 09 '14 at 09:56
Of course you can. But it will be automatically encoded. – Krisztián Balla Oct 09 '14 at 09:59
you could just read `element.Value`, you should be careful HTML decoding XML. It works fine for `&` but not all characters get escaped the same way by both standards. – Jodrell Oct 09 '14 at 10:37
@user2392525 adding `a&b` without encoding it would result in syntactically incorrect xml. `a&b` has a syntax error. `a&b` is correct. It only requires you to decode the values – hwcverwe Oct 14 '14 at 11:22

score 0 · Answer 3 · answered Oct 09 '14 at 09:43

Try following:

public static string GetTextFromHTML(String htmlstring)
    {
        // replace all tags with spaces...
       htmlstring= Regex.Replacehtmlstring)@"<(.|\n)*?>", " ");

       // .. then eliminate all double spaces
       while (htmlstring).Contains("  "))
       {
           htmlstring= htmlstring.Replace("  ", " ");
        }

       // clear out non-breaking spaces and & character code
       htmlstring = htmlstring.Replace("&nbsp;", " ");
       htmlstring = htmlstring.Replace("&amp;", "&");

       return htmlstring;
    }

How to unescape special characters in c#

3 Answers3