2

Is there any approach to convert Unicode like "U+4E0B" to its equivalent character?

Ex- {\U+FF8D\U+FF9E\U+FF9D\U+FF84\U+FF9E\U+FF97\U+FF72\U+FF9D - \U+4E0B}

any type of help is appreciated!

Fuzzybear
  • 1,388
  • 2
  • 25
  • 42
Adi
  • 21
  • 2
  • 1
    Possible duplicate of [How to decode a Unicode character in a string](https://stackoverflow.com/questions/9303257/how-to-decode-a-unicode-character-in-a-string) – Fuzzybear Jul 20 '18 at 11:04
  • this may also help https://stackoverflow.com/questions/4184190/unicode-to-string-conversion-in-c-sharp – Fuzzybear Jul 20 '18 at 11:05
  • 1
    @Fuzzybear They are both questions with bad answers, and the format is different. The first one uses `\u00f6` instead of `\u+00f6`. – xanatos Jul 20 '18 at 11:12
  • Try :System.Net.WebUtility.HtmlDecode(string) – jdweng Jul 20 '18 at 11:34

1 Answers1

3

Simple way is using a regex + Regex.Replace() with delegate:

string str = @"\U+FF8D\U+FF9E\U+FF9D\U+FF84\U+FF9E\U+FF97\U+FF72\U+FF9D - \U+4E0BFooBar";

var rx = new Regex(@"\\U\+([0-9A-F]{4})");

string str2 = rx.Replace(str, m =>
{
    ushort u = Convert.ToUInt16(m.Groups[1].Value, 16);
    return ((char)u).ToString();
});

Unclear if you want to be case insensitive (so that \u+ff9e is valid), then use:

var rx = new Regex(@"\\[Uu]\+([0-9A-Fa-f]{4})");
xanatos
  • 109,618
  • 12
  • 197
  • 280