Which encoding is the C# equivalent to this java code? "Charles Okwuagwu".getBytes();

Question

say I have this code in Java: "Charles Okwuagwu".getBytes();

in c# this is simply Encoding.UTF8.GetBytes("Charles Okwuagwu");

My question is this: 1)Java uses UTF16 for strings 2)the sting content is basically just ASCII

Wouldn't it be equivalent in c# to simply use Encoding.ASCII.GetBytes("Charles Okwuagwu"); ?

EDIT I ran this little test in .net:

Console.WriteLine("Default:{0}", B2H(Text.Encoding.Default.GetBytes("Charles Okwuagwu")))
Console.WriteLine("ASCII:{0}", B2H(Text.Encoding.ASCII.GetBytes("Charles Okwuagwu")))
Console.WriteLine("BigEndianUnicode:{0}", B2H(Text.Encoding.BigEndianUnicode.GetBytes("Charles Okwuagwu")))
Console.WriteLine("Unicode:{0}", B2H(Text.Encoding.Unicode.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF32:{0}", B2H(Text.Encoding.UTF32.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF7:{0}", B2H(Text.Encoding.UTF7.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF8:{0}", B2H(Text.Encoding.UTF8.GetBytes("Charles Okwuagwu")))

Results:

Default:436861726C6573204F6B777561677775
ASCII:436861726C6573204F6B777561677775
BigEndianUnicode:0043006800610072006C006500730020004F006B007700750061006700770075
Unicode:43006800610072006C006500730020004F006B00770075006100670077007500
UTF32:430000006800000061000000720000006C0000006500000073000000200000004F0000006B000000770000007500000061000000670000007700000075000000
UTF7:436861726C6573204F6B777561677775
UTF8:436861726C6573204F6B777561677775

it would seem UTF8,UTF7,ASCII give the same bytes. but Java strings default to UTF16 ...

I believe it would be `Encoding.Default.GetBytes("Charles Okwuagwu");`. — Sotirios Delimanolis, Jan 01 '15 at 16:30
Better to specify the encoding on both sides. In Java that is `"Charles Okwuagwu".getBytes("UTF-8");` or whatever encoding you prefer. — rossum, Jan 01 '15 at 16:42
@rossum What does Java do by default? say i'm porting existing code from Java to c# — Charles Okwuagwu, Jan 01 '15 at 17:08
Clarifying a couple things: .NET also uses UTF-16 for strings. .NET's `Encoding.Unicode` would be better named `Encoding.UTF16LE`. — Tom Blodget, Jan 01 '15 at 21:07

score 2 · Accepted Answer · edited Jan 01 '15 at 21:09

2

String.getBytes in Java uses the default encoding of the platform. So the C# equivalent would be:

Encoding.Default.GetBytes("Charles Okwuagwu");

edited Jan 01 '15 at 21:09

Tom Blodget

20,260
3
39
72

answered Jan 01 '15 at 17:16

Brett Okken

6,210
1
19
25

I accept this reasoning. This is the proper answer i guess. – Charles Okwuagwu Jan 01 '15 at 17:23

score 0 · Answer 2 · answered Jan 01 '15 at 17:04

0

Since Java uses UTF16, the equivalent .Net code would be:

Encoding.Unicode.GetBytes("Charles Okwuagwu")

See: http://msdn.microsoft.com/en-us/library/system.text.encoding.unicode(v=vs.110).aspx

answered Jan 01 '15 at 17:04

Gabe

71
1
7

would the output not essentially be the same if i use Encoding.ASCII, since the text only contains Ascii chars – Charles Okwuagwu Jan 01 '15 at 17:10
You asked for the equivelant code. The results of Unicode.GetBytes and ASCII.GetBytes are not the same. Unicode: 67 0 104 0 97 0 ... ASCII: 67 104 97 ... – Gabe Jan 01 '15 at 17:24
Equivalent code should give identical result, see my edit of the question to include some tests i ran. Unicode would give me different bytes – Charles Okwuagwu Jan 01 '15 at 17:26
@CharlesO UTF-16 and ascii will not be equivalent. UTF-16 will use (at least) 2 bytes per character. In this specific case of all the characters being in ascii, it will use 2 bytes per character. – Brett Okken Jan 01 '15 at 17:26
@ Gabe see Brett's comment above. – Charles Okwuagwu Jan 01 '15 at 17:29
@CharlesO I guess it depends on if you want the platform default or the Java default (which is UTF16), see: http://stackoverflow.com/questions/4453269/encoding-of-string-in-java – Gabe Jan 01 '15 at 19:14

Which encoding is the C# equivalent to this java code? "Charles Okwuagwu".getBytes();

2 Answers2