22

I have the following String of characters.

string s = "\\u0625\\u0647\\u0644";

When I print the above sequence, I get:

\u0625\u0647\u062

How can I get the real printable Unicode characters instead of this \uxxxx representation?

Ruzihm
  • 19,749
  • 5
  • 36
  • 48
Marc Andreson
  • 3,405
  • 5
  • 35
  • 51
  • 5
    I find the question a bit vague, do you control that string? If so, just remove one of the backslashes, ie. "\u1234\u5678". If not, you should consider using regex with a callback method to parse out the number, convert it to a char, and then return that char as a string – Onkelborg Jul 28 '12 at 12:01
  • What do you mean by "you can't control the string"? What's your scenario? – Sergei Rogovtcev Jul 28 '12 at 12:04
  • 1
    Ok I found the answer: System.Text.RegularExpressions.Regex.Unescape() – Marc Andreson Jul 28 '12 at 12:07
  • How do you go the otherway, ie. from unescaped string that contains the Unicode character to the \\uXXXX escaped form? PS: I have tried the obvious `Regex.Escape(...)` method, but it doesn't work with the following: tomato sauce #thankyou! – Jaans Apr 29 '14 at 06:45
  • 2
    @MarcAndreson please add your solution as an answer and mark it as accepted, so that others will see clearly what solved your problem. – Konrad Gadzina Dec 24 '15 at 11:43

5 Answers5

6

If you really don't control the string, then you need to replace those escape sequences with their values:

Regex.Replace(s, @"\u([0-9A-Fa-f]{4})", m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString());

and hope that you don't have \\ escapes in there too.

Joey
  • 344,408
  • 85
  • 689
  • 683
3

Asker posted this as an answer to their question:

I have found the answer:

s = System.Text.RegularExpressions.Regex.Unescape(s);
Ruzihm
  • 19,749
  • 5
  • 36
  • 48
1

Try Regex:

String inputString = "\\u0625\\u0647\\u0644";

var stringBuilder = new StringBuilder();
foreach (Match match in Regex.Matches(inputString, @"\u([\dA-Fa-f]{4})"))
{
    stringBuilder.AppendFormat(@"{0}", 
                               (Char)Convert.ToInt32(match.Groups[1].Value));
}

var result = stringBuilder.ToString();
Ria
  • 10,237
  • 3
  • 33
  • 60
-1

I had the following string "\u0001" and I wanted to get the value of it.
I tried a lot but this is what worked for me

int val = Convert.ToInt32(Convert.ToChar("\u0001")); // val = 1;

if you have multiple chars you can use the following technique

var original ="\u0001\u0002";
var s = "";
for (int i = 0; i < original.Length; i++)
{
    s += Convert.ToInt32(Convert.ToChar(original[i]));
}

// s will be "12"
Hakan Fıstık
  • 16,800
  • 14
  • 110
  • 131
-2

I would suggest the use of String.Normalize. You can find everything here:

http://msdn.microsoft.com/it-it/library/8eaxk1x2.aspx

dierre
  • 7,140
  • 12
  • 75
  • 120