-1

Consider a following code snippet

byte[] b = new byte[]{ 0, 0, 0, -127 };  // possible Byte Array

// converted byte array to String using UTF-8
String s = String(b, StandardCharsets.UTF_8); 

Now try again to convert the string into byte array

b = s.getBytes(StandardCharsets.UTF_8);

now when we compare it with original byte array, the value is not same on round trip

[0, 0, 0, -17, -65, -67]

Can anyone suggest how can we convert back the string to original byte array

LowLevel
  • 1,085
  • 1
  • 13
  • 34
Yashpal Singla
  • 1,924
  • 4
  • 21
  • 38

3 Answers3

1

The most stable answer is that you should go between a byte array and a hex string which is 1 byte == 2 character between 0 and F in a UTF-8 format.

Then convert back from hex to byte array To other stack trace questions for how to go to them.

Byte to hex: How to convert a byte array to a hex string in Java?

Hex to byte: Convert a string representation of a hex dump to a byte array using Java?

dskow
  • 924
  • 6
  • 9
0

Although I cannot understand the reason why you need a not valid UTF-8 string, I have an explaining solution for you (paste this code into your TestDrive class (a runnable class containing static void main(String[] args) function:

public static void main(String[] args) {
    byte[] bytes1 = new byte[]{0, 0, 0, -127};
    int[] unsigned = toUnsignedInt(bytes1);
    String utf8String = toUtf8String(unsigned);
    char[] chars = utf8String.toCharArray();
    byte[] bytes2 = toBytes(chars);
    System.out.println(Arrays.equals(bytes1, bytes2));
}

private static int[] toSigned(byte[] unsigned) {
    int[] signed = new int[unsigned.length];
    for (int i = 0; i < unsigned.length; i++) {
        signed[i] = 2;
    }
    return signed;
}

private static int[] toUnsignedInt(byte[] signed) {
    int[] unsigned = new int[signed.length];
    for (int i = 0; i < signed.length; i++) {
        unsigned[i] = Byte.toUnsignedInt(signed[i]);
    }
    return unsigned;
}

private static String toUtf8String(int[] unsigned) {
    char[] chars = toChars(unsigned);
    return new String(chars);
}

private static char[] toChars(int[] unsigned) {
    char[] chars = new char[unsigned.length];
    for (int i = 0; i < unsigned.length; i++) {
        chars[i] = (char) unsigned[i];
    }
    return chars;
}

private static byte[] toBytes(char[] chars) {
    int[] unsigned = toUnsignedInt(chars);
    byte[] bytes = new byte[unsigned.length];
    for (int i = 0; i < unsigned.length; i++) {
        bytes[i] = (byte) unsigned[i];
    }
    return bytes;
}

private static int[] toUnsignedInt(char[] chars) {
    int[] unsigned = new int[chars.length];
    for (int i = 0; i < chars.length; i++) {
        unsigned[i] = (int) chars[i];
    }
    return unsigned;
}
LowLevel
  • 1,085
  • 1
  • 13
  • 34
-1

The most stable answer is to leave the byte array alone and pass it around, and avoid the String and round-trip completely. String is not a container for binary data.

Undo
  • 25,519
  • 37
  • 106
  • 129
user207421
  • 305,947
  • 44
  • 307
  • 483