16

I have a Unicode string from a text file such that. And I want to display the real character.

For example:

\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b

When read this string from text file, using StreamReader.ReadToLine(), it escape the \ to '\\' such as "\\u8ba1", which is not wanted.

It will display the Unicode string same as from text. Which I want is to display the real character.

  1. How can change the "\\u8ba1" to "\u8ba1" in the result string.
  2. Or should use another Reader to read the string?
gotqn
  • 42,737
  • 46
  • 157
  • 243
Hyzups
  • 161
  • 1
  • 4
  • possible duplicate of [Why when I read from an XML document do I get \r\r\n\n etc etc?](http://stackoverflow.com/questions/5980968/why-when-i-read-from-an-xml-document-do-i-get-r-r-n-n-etc-etc) – dtb Dec 19 '11 at 08:22
  • You could provide encoding in the StreamReader constructor – Anand Dec 19 '11 at 08:24
  • possible duplicate of [How do convert unicode escape sequences to unicode characters in a .NET string](http://stackoverflow.com/questions/183907/how-do-convert-unicode-escape-sequences-to-unicode-characters-in-a-net-string) – dtb Dec 19 '11 at 08:41
  • See my answer to this problem here: [Evaluate escaped string in C#][1] [1]: http://stackoverflow.com/questions/6629020/evaluate-escaped-string – deAtog Jan 13 '12 at 18:04

2 Answers2

26

If you have a string like

var input1 = "\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

// input1 == "计算机•网络•技术类"

you don't need to unescape anything. It's just the string literal that contains the escape sequences, not the string itself.


If you have a string like

var input2 = @"\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

you can unescape it using the following regex:

var result = Regex.Replace(
    input2,
    @"\\[Uu]([0-9A-Fa-f]{4})",
    m => char.ToString(
        (char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));

// result == "计算机•网络•技术类"
dtb
  • 213,145
  • 36
  • 401
  • 431
6

This question came out in the first result when googling, but I thought there should be a simpler way... this is what I ended up using:

using System.Text.RegularExpressions;

//...

var str = "Ingl\\u00e9s";
var converted = Regex.Unescape(str);
Console.WriteLine($"{converted} {str != converted}"); // Inglés True
rraallvv
  • 2,875
  • 6
  • 30
  • 67
  • UrlDecode does nothing for `\u00e9`, because it is already the real character. If you apply this to the question it would be `string x = HttpUtility.UrlDecode("Ingl\\u00e9s");` which does not do anything. – Silvermind Aug 20 '20 at 11:09
  • 1
    @Silvermind good catch. Edited the answer, it should work now. – rraallvv Aug 20 '20 at 11:52