-1

say I have this code in Java: "Charles Okwuagwu".getBytes();

in c# this is simply Encoding.UTF8.GetBytes("Charles Okwuagwu");

My question is this: 1)Java uses UTF16 for strings 2)the sting content is basically just ASCII

Wouldn't it be equivalent in c# to simply use Encoding.ASCII.GetBytes("Charles Okwuagwu"); ?

EDIT I ran this little test in .net:

Console.WriteLine("Default:{0}", B2H(Text.Encoding.Default.GetBytes("Charles Okwuagwu")))
Console.WriteLine("ASCII:{0}", B2H(Text.Encoding.ASCII.GetBytes("Charles Okwuagwu")))
Console.WriteLine("BigEndianUnicode:{0}", B2H(Text.Encoding.BigEndianUnicode.GetBytes("Charles Okwuagwu")))
Console.WriteLine("Unicode:{0}", B2H(Text.Encoding.Unicode.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF32:{0}", B2H(Text.Encoding.UTF32.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF7:{0}", B2H(Text.Encoding.UTF7.GetBytes("Charles Okwuagwu")))
Console.WriteLine("UTF8:{0}", B2H(Text.Encoding.UTF8.GetBytes("Charles Okwuagwu")))

Results:

Default:436861726C6573204F6B777561677775
ASCII:436861726C6573204F6B777561677775
BigEndianUnicode:0043006800610072006C006500730020004F006B007700750061006700770075
Unicode:43006800610072006C006500730020004F006B00770075006100670077007500
UTF32:430000006800000061000000720000006C0000006500000073000000200000004F0000006B000000770000007500000061000000670000007700000075000000
UTF7:436861726C6573204F6B777561677775
UTF8:436861726C6573204F6B777561677775

it would seem UTF8,UTF7,ASCII give the same bytes. but Java strings default to UTF16 ...

Charles Okwuagwu
  • 10,538
  • 16
  • 87
  • 157

2 Answers2

2

String.getBytes in Java uses the default encoding of the platform. So the C# equivalent would be:

Encoding.Default.GetBytes("Charles Okwuagwu");
Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
Brett Okken
  • 6,210
  • 1
  • 19
  • 25
0

Since Java uses UTF16, the equivalent .Net code would be:

Encoding.Unicode.GetBytes("Charles Okwuagwu")

See: http://msdn.microsoft.com/en-us/library/system.text.encoding.unicode(v=vs.110).aspx

Gabe
  • 71
  • 1
  • 7
  • would the output not essentially be the same if i use Encoding.ASCII, since the text only contains Ascii chars – Charles Okwuagwu Jan 01 '15 at 17:10
  • You asked for the equivelant code. The results of Unicode.GetBytes and ASCII.GetBytes are not the same. Unicode: 67 0 104 0 97 0 ... ASCII: 67 104 97 ... – Gabe Jan 01 '15 at 17:24
  • Equivalent code should give identical result, see my edit of the question to include some tests i ran. Unicode would give me different bytes – Charles Okwuagwu Jan 01 '15 at 17:26
  • @CharlesO UTF-16 and ascii will not be equivalent. UTF-16 will use (at least) 2 bytes per character. In this specific case of all the characters being in ascii, it will use 2 bytes per character. – Brett Okken Jan 01 '15 at 17:26
  • @ Gabe see Brett's comment above. – Charles Okwuagwu Jan 01 '15 at 17:29
  • @CharlesO I guess it depends on if you want the platform default or the Java default (which is UTF16), see: http://stackoverflow.com/questions/4453269/encoding-of-string-in-java – Gabe Jan 01 '15 at 19:14