47

I have a string (from a CDATA element) that contains description of XML. I need to decode this string into a new string that displays the characters correctly using C#

Existing String:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><myreport xmlns="http://test.com/rules/client"><admin><ordernumber>123</ordernumber><state>NY</state></report></myreport>

String Wanted:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<myreport xmlns="http://test.com/rules/client">
<admin><ordernumber>123</ordernumber><state>NY</state></report></myreport>
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
user31673
  • 13,245
  • 12
  • 58
  • 96

7 Answers7

50
  1. HttpUtility.HtmlDecode from System.Web
  2. WebUtility.HtmlDecode from System.Net
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
45

You can use System.Net.WebUtility.HtmlDecode instead of HttpUtility.HtmlDecode

Useful if you don't want System.Web reference and prefer System.Net instead.

Wojciech
  • 199
  • 9
matabares
  • 828
  • 1
  • 9
  • 15
  • 2
    Thanks! This is really handy, as I want to target the .NET 4.0 Client Profile, but referencing System.Web would require me to target the full .NET 4.0 profile. – Mal Ross Sep 25 '15 at 13:18
6

As Kirill and msarchet said, you can use HttpUtility.HtmlDecode from System.Web. It escapes pretty much anything correctly.

If you don't want to reference System.Web you might use some trick which supports all XML escaping but not HTML-specific escaping like &eacute;:

public static string XmlDecode(string value) {
    var xmlDoc = new XmlDocument();
    xmlDoc.LoadXml("<root>" + value + "</root>");
    return xmlDoc.InnerText;
}

You could also use a RegEx or simple string.Replace but it would only support basic XML escaping. Things like &#x410; or &eacute; are examples that would be harder to support.

Wernight
  • 36,122
  • 25
  • 118
  • 131
1

HttpUtility.HtmlDecode(xmlString) will solve this issue

Jon Grant
  • 11,369
  • 2
  • 37
  • 58
Sharthak Ghosh
  • 576
  • 1
  • 9
  • 22
0

You can use HTML.Raw. That way the markup is not encoded.

Andrei S
  • 9
  • 1
-1

You just need to replace the scaped characters with their originals.

string stringWanted= existingString.Replace("&lt;", "<")
                                                   .Replace("&amp;", "&")
                                                   .Replace("&gt;", ">")
                                                   .Replace("&quot;", "\"")
                                                   .Replace("&apos;", "'");
Ghasem
  • 14,455
  • 21
  • 138
  • 171
  • Well that is very strange. I've just [produced an example](https://dotnetfiddle.net/vRfBTE) that I was expecting to demonstrate the problem, and it works precisely as desired. What makes it strange is that I *know* this exact situation is responsible for an XML parsing error in a codebase I maintain that I fixed *yesterday*. At least, I think it's exactly the same. I'll cancel the downvote and remove my original comment until I get a chance to check. – Tom W Jan 09 '16 at 11:00
-2

You might also consider the static parse method from XDocument. I'm not sure how it compares to others mentioned here, but it seems to parse these strings well.

Once you get the resulting XDocument, you could turn around with ToString to get the string back:

string parsedString = XDocument.Parse("<My XML />").ToString();
Noah Stahl
  • 6,905
  • 5
  • 25
  • 36