3

I have several HTML entities in an XML returned by a web service as XmlDocument type. I need to replace them with their equivalent Unicode characters, before applying XSLT transformation.

XML Snippet

<ics>
 <record>
  <metadata>
    <meta name="Abstract" content="In the series of compounds observed after effect of &amp;#947;-quanta"/>
  </metadata>
 </record>
</ics>

I'm using C# with .Net 4.0. I tried using HttpUtility.HtmlDecode on the OuterXml property of the above XmlDocument, but it doesn't convert the HTML entities to Unicode.

How can this be achieved?

EDIT:

I see that applying HtmlDecode once gets &amp;#947; to &#947;. If I apply it once more, I get the required Unicode.

Any better ways to do it?

itsbalur
  • 992
  • 3
  • 17
  • 39
  • In here: http://stackoverflow.com/questions/8348879/decoding-all-html-entities it says it should work. What do you mean with "it doesn't convert [...] to Unicode" ? – Bart Friederichs Dec 20 '12 at 07:20

1 Answers1

5

Use WebUtility.HtmlDecode in .NET 4.0

Also, &amp;#947; decodes to &#947; verbatim, not the Unicode character γ. Main problem is that your "HTML" is incorrect. You'll have to do it twice to get the gamma character.

Bart Friederichs
  • 33,050
  • 15
  • 95
  • 195
  • Thanks, I tried using both WebUtility.HtmlDecode and HttpUtility.HtmlDecode on the OuterXml, but the resulting string has the γ instead of its Unicode equivalent. – itsbalur Dec 20 '12 at 07:29