1

I have problem with converting string to bytes in Java when I'm porting my C# library to it. It converts the string but it is not the same byte array.

I use this code in C#

string input = "Test ěščřžýáíé 1234";
Encoding encoding = Encoding.UTF8;
byte[] data = encoding.GetBytes(input);

And code in Java

String input = "Test ěščřžýáíé 1234";
String encoding = "UTF8";
byte[] data = input.getBytes(encoding);

Lwft one is Java output and right one is C# how to make Java output same as C# one ?

enter image description here

PSSGCSim
  • 1,247
  • 2
  • 18
  • 35
  • 1
    It should be "UTF-8" (edit: shouldn't matter -- "UTF8" is an alias) – fge Feb 27 '14 at 12:12
  • Can you try and use `StandardCharsets.UTF_8` and the appropriate `.getBytes()` method? – fge Feb 27 '14 at 12:15
  • 2
    Wait wait wait -- how do you test that the bytes are the same? Don't forget that `byte` in C# is unsigned while it is a _signed_ value in Java – fge Feb 27 '14 at 12:17

2 Answers2

3

In likelihood, the byte arrays are the same. However, if you're formatting them to a string representation (e.g. to view through a debugger), then they would appear different, since the byte data type is treated as unsigned in C# (having values 0255) but signed in Java (values -128127). Refer to this question and my answer for an explanation.

Edit: Based on this answer, you can print unsigned values in Java using:

byte b = -60;
System.out.println((short)(b & 0xFF));   // output: 196
Community
  • 1
  • 1
Douglas
  • 53,759
  • 13
  • 140
  • 188
  • And is there any way to get in Java unsigned bytes instead signed in C# ? – PSSGCSim Feb 27 '14 at 12:25
  • @JanSchwar see my answer; but literally, you cannot get "unsigned bytes". Some libraries, like Guava, do provide helpers for such cases however. – fge Feb 27 '14 at 12:30
  • 1
    To compare the lists (visually) `for ( byte b : data ) System.out.println(b < 0 ? 256 + b : b);` – Eric Feb 27 '14 at 12:33
2

These arrays are very probably the same.

You are hit by a big difference between C# and Java: in Java, byte is unsigned.

In order to dump, try this:

public void dumpBytesToStdout(final byte[] array)
{
    for (final byte b: array)
        System.out.printf("%02X\n", b);
}

And do an equivalent dump method in C# (no idea how, I don't do C#)

Alternatively, if your dump function involves integer types larger than byte, for instance an int, do:

i & 0xff

to remove the sign bits. Note that if you cast byte -1, which reads:

1111 1111

to an int, this will NOT give:

0000 0000 0000 0000 0000 0000 1111 1111

but:

1111 1111 1111 1111 1111 1111 1111 1111

ie, the sign bit is "carried" (otherwise, casting would yield int value 255, which is not -1)

fge
  • 119,121
  • 33
  • 254
  • 329