0

Possible Duplicate:
How can I decode HTML characters in C#?

I have a problem converting string below strToCheck from html-encoded version to its actual UTF-8 representation.

// Code sample:    
string result = null;

// HTML-encoded Input String (From Google Translate API) , renders "भारत महान देश है." in Hindi language.
string strTocheck = "भारत महान देश है.";

using (var sw = new StreamWriter(File.Open(@"c:\myfile.txt", FileMode.OpenOrCreate), Encoding.UTF8)) // UTF-8 encoding
{
    sw.WriteLine(strTocheck);
}

System.IO.StreamReader reader = new System.IO.StreamReader(@"c:\myfile.txt", Encoding.UTF8); // UTF-8 encoding
result = reader.ReadToEnd();
MessageBox.Show(result);

// I expected "भारत महान देश है."
// But got output : भारत महान देश है.

Any help will be highly appreciated. Thank you.

Image here (please open it in a new tab) >> https://i.stack.imgur.com/xcctU.png

Community
  • 1
  • 1
bharat1
  • 103
  • 1
  • 13
  • 2
    Your title is quite misleading, as ASCII is a proper subset of UTF-8 (both in terms of the actual encoding and the character set). – Joey May 30 '12 at 15:16
  • 3
    `भ` is not ASCII, it's HTML encoding of some character (code point 2349). Run it through a HtmlUnencode utility. – Hans Kesting May 30 '12 at 15:21
  • @Joey Well, I edited that Title part. You're right that ASCII is subset of UTF-8. but You'll get idea if you open up image above. – bharat1 May 30 '12 at 15:34
  • 2
    This is doing exactly what I would expect it to. – Security Hound May 30 '12 at 15:40
  • Thank you so much to everyone contributing to resolution on this topic ! I'll highly appreciate if you have any suggestion links to study more about this... Thanks again :) – bharat1 Jun 01 '12 at 05:28

1 Answers1

0

It is working as expected you are just not using it as expected :p

See UTF8 not working in Excel

And http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/433ecab8-f800-4376-b351-4bbce93679d9 which links to MySQL C# Text Encoding Problems

Community
  • 1
  • 1
Jay
  • 3,276
  • 1
  • 28
  • 38