How to convert from HTML encodings to UTF 8 while writing to a file?

Question

Possible Duplicate:
How can I decode HTML characters in C#?

I have a problem converting string below strToCheck from html-encoded version to its actual UTF-8 representation.

// Code sample:    
string result = null;

// HTML-encoded Input String (From Google Translate API) , renders "भारत महान देश है." in Hindi language.
string strTocheck = "&#2349;&#2366;&#2352;&#2340; &#2350;&#2361;&#2366;&#2344; &#2342;&#2375;&#2358; &#2361;&#2376;.";

using (var sw = new StreamWriter(File.Open(@"c:\myfile.txt", FileMode.OpenOrCreate), Encoding.UTF8)) // UTF-8 encoding
{
    sw.WriteLine(strTocheck);
}

System.IO.StreamReader reader = new System.IO.StreamReader(@"c:\myfile.txt", Encoding.UTF8); // UTF-8 encoding
result = reader.ReadToEnd();
MessageBox.Show(result);

// I expected "भारत महान देश है."
// But got output : &#2349;&#2366;&#2352;&#2340; &#2350;&#2361;&#2366;&#2344; &#2342;&#2375;&#2358; &#2361;&#2376;.

Any help will be highly appreciated. Thank you.

Image here (please open it in a new tab) >> https://i.stack.imgur.com/xcctU.png

Your title is quite misleading, as ASCII is a proper subset of UTF-8 (both in terms of the actual encoding and the character set). — Joey, May 30 '12 at 15:16
`भ` is not ASCII, it's HTML encoding of some character (code point 2349). Run it through a HtmlUnencode utility. — Hans Keﬆing, May 30 '12 at 15:21
@Joey Well, I edited that Title part. You're right that ASCII is subset of UTF-8. but You'll get idea if you open up image above. — bharat1, May 30 '12 at 15:34
Thank you so much to everyone contributing to resolution on this topic ! I'll highly appreciate if you have any suggestion links to study more about this... Thanks again :) — bharat1, Jun 01 '12 at 05:28

score 0 · Answer 1 · edited May 23 '17 at 11:59

0

It is working as expected you are just not using it as expected :p

See UTF8 not working in Excel

And http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/433ecab8-f800-4376-b351-4bbce93679d9 which links to MySQL C# Text Encoding Problems

edited May 23 '17 at 11:59

Community

1
1

answered May 30 '12 at 15:22

Jay

3,276
1
28
38

How to convert from HTML encodings to UTF 8 while writing to a file?

1 Answers1