3

This C# code...

string s = "\u00C0";
byte[] bytes = ASCIIEncoding.ASCII.GetBytes(s);
Trace.WriteLine(BitConverter.ToString(bytes));

produces the following output:

3F

Why is the output not C0?

Verax
  • 2,409
  • 5
  • 27
  • 42
  • 3
    You are probably looking for `Encoding.GetEncoding("ISO-8859-1").GetBytes(s)`. It is the [only encoding that gives a byte value of exactly the code point value](http://stackoverflow.com/a/15938015/995876). – Esailija Apr 12 '13 at 06:48
  • I wonder, why do you need ASCIIEncoding? What's wrong with UTF8Encoding? – Pavel Radzivilovsky Apr 13 '13 at 12:53

3 Answers3

4

Because \u00c0 is not ASCII ( 0-127 range). As result it is encoded as if it is question mark - ? (0x3F).

See MSDN article on ASCIIEncoding:

ASCIIEncoding corresponds to the Windows code page 20127. Because ASCII is a 7-bit encoding, ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. If you use the default encoder returned by the Encoding.ASCII property or the ASCIIEncoding constructor, characters outside that range are replaced with a question mark (?) before the encoding operation is performed.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
4

It seems that you want a byte sequence that represents a string of Unicode characters. Obviously, the bytes will depend on the encoding. Since you expect C0 to be one of the bytes, it narrows the options down a bit. Here is UTF16LE, which of course is two bytes since \u00c0 completely represents a BMP character:

string s = "\u00C0";
byte[] bytes = Encoding.Unicode.GetBytes(s);
Trace.WriteLine(BitConverter.ToString(bytes));

You should read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
1

First step: you assing unicode char to string, then you convert it to ASCII (but it is unicode). Then you are trying to convert it back using unicode converter.

The following example do all possibilities to make my response more clear:

    static void Main(string[] args)
    {
        string s = "\u00C0";
        Console.WriteLine(s);
        byte[] bytes = ASCIIEncoding.ASCII.GetBytes(s);
        Console.WriteLine(BitConverter.ToString(bytes));
        Console.WriteLine(ASCIIEncoding.ASCII.GetString(bytes));

        Console.WriteLine("Again");
        bytes = Encoding.UTF8.GetBytes(s);
        Console.WriteLine(BitConverter.ToString(bytes));
        Console.WriteLine(Encoding.UTF8.GetString(bytes));

        Console.ReadLine();
    }

And the output is:

A
3F
?
Again
C3-80
A

Btw the definition of BitConverter.GetBytes is:

Converts the numeric value of each element of a specified array of bytes to its equivalent hexadecimal string representation.

Piotr Stapp
  • 19,392
  • 11
  • 68
  • 116
  • `BitConverter.ToString(bytes)` is a convenient way to convert a byte array to a space delimited hexadecimal string. It was used in the OP's code simply as a convenient way of outputing the byte array's values in hexadecimal. – Verax Apr 13 '13 at 00:05