0

Here is a simple test case, I feel like I'm missing something basic but any help would be appreciated!

string data = @"Well done UK building industry, Olympics \u00a3377m under budget + boost";
foreach (Match m in Regex.Matches(data, @"\\u(\w*)\b"))
{
    Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
    string match = m.Value;
    // These should output the exact same thing however the first is a £ and the other is \u00a3377m
    Console.WriteLine("\u00a3377m" + "      " + match);
}
Jordan
  • 248
  • 1
  • 2
  • 7
  • Quite basic indeed. You're missing out on `@` turning `data` into a literal string. [A little brushing up on literals won't hurt](http://msdn.microsoft.com/en-US/library/vstudio/362314fe.aspx) – Ichabod Clay Jul 26 '13 at 09:06
  • Sorry looking back I really didn't explain what I was trying to do very well, I was aware that I was making data a literal, I just did that so I didn't have \\u00a3377m which may of confused some people. What I was actually trying to do was have match output a £ sign like the string input manually, the way I achieved this in the end was using the function from http://stackoverflow.com/questions/9738282/replace-unicode-escape-sequences-in-a-string . Thanks for the help though. – Jordan Jul 26 '13 at 10:24

3 Answers3

0

You forgot to escape the string you are printing manually. Hence the special character '\u00a3377m' is resolved directly.

The following works as desired:

// These should output the exact same thing however the first is a £ and the other is \u00a3377m
            Console.WriteLine("\\u00a3377m" + "      " + match);

Another option is using the @:

Console.WriteLine(@"\u00a3377m" + "      " + match);
Toastgeraet
  • 371
  • 1
  • 11
0

00A3 is unicode of £ character. Take a look http://unicode-table.com/en/#00A3

So, when you try to write "\u00a3377m", regular string literal will be £377m.

Use verbtaim string literal instead like;

Console.WriteLine(@"\u00a3377m" + "      " + match);

I completely forgot to add to the question that I actually wanted the £ sign

char c = '\u00a3';
string s = c.ToString(); // s will be £
Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
  • Yes sorry, I completely forgot to add to the question that I actually wanted the £ sign and not the literal. I will answer the question explaining how I did it. – Jordan Jul 26 '13 at 10:28
  • 1
    That looks like a nice solution, I didn't realise the .ToString function could be used in that way! Thanks for the information. – Jordan Jul 29 '13 at 09:33
0

I appreciate the help, however it's my fault as I missed off some key information.

I actually wanted the output to be "£ £" rather than "£ \u00a3377m".

To do this I ended up using the answer from Replace unicode escape sequences in a string which was to use the following function:

private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string Decoder(string value)
{
    return _regex.Replace(
        value,
        m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
    );
}

Then use it like so:

string data = @"Well done UK building industry, Olympics \u00a3377m under budget + boost";
foreach (Match m in Regex.Matches(data, @"\\u(\w*)\b"))
{
    Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
    string match = m.Value;
    //Decode the string so we no longer have \u values
    match = Decoder(match);
    // These should output the exact same thing however the first is a £ and the other is \u00a3377m
    Console.WriteLine("\u00a3377m" + "      " + match);
}
Community
  • 1
  • 1
Jordan
  • 248
  • 1
  • 2
  • 7