2

I've found a phenomenon I can't really understand in Java and I post a question

  1. Put some data into the byte array
  2. Convert byte array to string
  3. Convert the converted string back to a byte array
  4. When comparing the first data and the data after conversion, some of them will be output differently.

The source code and log are below

Source code

// 1. Input byte array data
byte[] beforeBytes = new byte[]{(byte)-83, (byte)-95, (byte)-55, (byte)-49, (byte)3};
log.info("Before ByteTest");
log.info("bytesLength : " + beforeBytes.length);
for (int i = 0; i < beforeBytes.length; ++i)
{
    log.info(i + " : " + (int)beforeBytes[i]);
}

// 2. Convert byte array to string
String testString = new String(beforeBytes);

// 3. Convert string to byte array
byte[] afterBytes = testString.getBytes();
log.info("After ByteTest");
log.info("bytesLength : " + afterBytes.length);
for (int i = 0; i < afterBytes.length; ++i)
{
    log.info(i + " : " + (int)afterBytes[i]);
}

Log

Before ByteTest
bytesLength : 5
0 : -83
1 : -95
2 : -55
3 : -49
4 : 3

After ByteTest
bytesLength : 5
0 : 63
1 : -95
2 : -55
3 : 63
4 : 3

I want to keep the same data as the existing data even after conversion Is there a workaround?

  • Does this answer your question? [Java byte array contains negative numbers](https://stackoverflow.com/questions/9609394/java-byte-array-contains-negative-numbers) – sorifiend Nov 11 '20 at 04:05
  • Great question, and it is easily confusing. Java doesn't have unsigned bytes (negative numbers). If you want to keep the negatives then you can use another data type like an int array `int[] beforeInt = new int[]{-83, -95, -55, -49, 3};`, or you can convert them back to unsigned bytes like here: https://stackoverflow.com/a/6966609/1270000 – sorifiend Nov 11 '20 at 04:08
  • 1
    63 is `?`, and is used as a replacement character when a byte combination doesn't map to a valid character in the character set (and the character set isn't UTF-8, which would use a different character as the replacement character). So, the problem depends on your actual character set (print out `System.getProperty("file.encoding")`). You should always explicitly specify the character set. However, if your goal is to store **binary** data in a string, this is not the correct way to do it (although you could use character set iso-8859-1). – Mark Rotteveel Nov 11 '20 at 15:26

1 Answers1

3

You should use a encoder and decoder like this:

        // 2. Convert byte array to string
    String testString = Base64.getEncoder().encodeToString(beforeBytes);

    // 3. Convert string to byte array
    byte[] afterBytes = Base64.getDecoder().decode(testString);

So finally your code will be like this:

    // 1. Input byte array data
    byte[] beforeBytes = new byte[]{(byte)-83, (byte)-95, (byte)-55, (byte)-49, 
    (byte)3};
    log.info("Before ByteTest");
    log.info("bytesLength : " + beforeBytes.length);
    for (int i = 0; i < beforeBytes.length; ++i)
    {
        log.info(i + " : " + (byte)beforeBytes[i]);
    }

    // 2. Convert byte array to string
    String testString = Base64.getEncoder().encodeToString(beforeBytes);

    // 3. Convert string to byte array
    byte[] afterBytes = Base64.getDecoder().decode(testString);
    log.info("After ByteTest");
    log.info("bytesLength : " + afterBytes.length);
    for (int i = 0; i < afterBytes.length; ++i)
    {
        log.info(i + " : " + (byte)afterBytes[i]);
    }

The method that you was using .getBytes() is not the most appropriate because you didn't write in the constructor the charSet like UTF-8 for example and the constructor can add or modify some of the data. Using the Base64 encoder and decoder is much better. Kind regards.

Dharman
  • 30,962
  • 25
  • 85
  • 135