0

Possible Duplicate:
How do I remove diacritics (accents) from a string in .NET?

Our project generates an string(Mērā nāma nitina hai) in web page and when we read it using Regex.match function then we get a string in which these special character are converted into some browser code like \&#\257(without backslash) in place of ā . So we want to convert it into 'a' or 'ā'. So that we can use it in further program. Thanks

Community
  • 1
  • 1
Sushant Jain
  • 328
  • 2
  • 6
  • 12

3 Answers3

1

Im not sure that my method is absolutely right but it works for me:

[EDIT]

string first = @"Mērā nāma nitina hai";
first = System.Web.HttpUtility.HtmlDecode(first);

byte[] ansi = System.Text.Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), Encoding.Unicode.GetBytes(first));
string output = Encoding.Unicode.GetString(System.Text.Encoding.Convert(Encoding.GetEncoding(1252), Encoding.Unicode, ansi));
MessageBox.Show(output);

The main idea of this code - you are converting your string to ANSI and back to UNICODE. After this action all diacritics is gone away.

Anton Semenov
  • 6,227
  • 5
  • 41
  • 69
1

How about this:

var correctStr = HttpUtility.HtmlDecode(@"Mērā nāma nitina hai");

Explanation: ā is an html entity character representing the special accented char with unicode code 257.

František Žiačik
  • 7,511
  • 1
  • 34
  • 59
0

You need to use the String.Normalize method.

Paolo Tedesco
  • 55,237
  • 33
  • 144
  • 193