0

I'm trying to convert a byte array to a String, then back to a byte array. The first part (byte[] to string) works, when I try to convert the string back to a byte array then compare what I get with my initial byte array, I find out they're different. I'm guessing it's an encoding issue, I tried different solutions (using UTF-8, ISO-8859-1, UTF-16LE and others) but none seem to work.

Would anyone know how to solve this problem? Thanks in advance

Path path = Paths.get("C:\\folder1", "profil1.bmp");

        try {

            //file to byte[] 
            byte[] byte_array = Files.readAllBytes(path);
            System.out.println(Arrays.toString(byte_array ));

            //byte[] to string
            String byte_string = Arrays.toString(byte_array); 

            //String to byte[]
            byte[] string_byte = byte_string.getBytes();

            System.out.println(Arrays.equals(byte_array, string_byte));

        } catch (IOException e) {
            System.out.println(e);
        }

Here's the output: (The result was too long, so I cut of a part of it)

[66, 77, -10, -44, 1, 0, 0, 0, 0, 0, 1, -1, ....... ,-1]
false
azurefrog
  • 10,785
  • 7
  • 42
  • 56
Husayn Hakeem
  • 4,184
  • 1
  • 16
  • 31
  • Why do you want to treat your BMP data as a `String` ? – Alnitak Jul 16 '15 at 14:46
  • I want to send it in an ArrayList> alongside other information (which are all strings) – Husayn Hakeem Jul 16 '15 at 14:48
  • Obviously the [Characterset](http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html) in both `byte[]` is different. Try big-endian or little-endian (depens on your OS) – MaxZoom Jul 16 '15 at 14:50
  • Try `US-ASCII` encoding (but make sure you use it for both encoding and decoding). It's generally a bad idea to treat raw data as Strings, but if you've no choice... – Alnitak Jul 16 '15 at 14:51

2 Answers2

3

Arrays.toString(byte[]) doesn't just convert the byte[] into a String, it converts it to a human-readable format. When you then call getBytes() on that String, it is converting the characters that represent the original byte information into a byte[], along with the formatting characters, such as the brackets and commas.

If you want to create a String from a byte[] use the String constructor which takes a byte[] to explicitly create a String object containing your data:

    ...
    //byte[] to string
    String byte_string = new String(byte_array);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes();

    System.out.println(Arrays.equals(byte_array, string_byte));

As pointed out by others, not all binary data is cleanly represented in all character sets, so you might be able to get the conversion to work by explicitly specifying the encoding.

For instance, the above sample code still outputs false when I try to encode an executable program file (.exe), but compares as true if I specify ISO_8859_1 encoding:

    //byte[] to string
    String byte_string = new String(byte_array, StandardCharsets.ISO_8859_1);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes(StandardCharsets.ISO_8859_1);

    System.out.println(Arrays.equals(byte_array, string_byte));

The absolute safest way to convert your data to a String and back would be to use base64 encoding as suggested by this answer:

    //file to byte[] 
    byte[] byte_array = Files.readAllBytes(path);
    byte[] encoded = Base64.encodeBase64(byte_array);

    //byte[] to string
    String byte_string = new String(encoded, StandardCharsets.US_ASCII);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes(StandardCharsets.US_ASCII);
    byte[] decoded = Base64.decodeBase64(string_byte);

    System.out.println(Arrays.equals(byte_array, decoded));
Community
  • 1
  • 1
azurefrog
  • 10,785
  • 7
  • 42
  • 56
1

Char/String contain Unicode text by design (as opposed to other languages). That means they

  • always convert back and forth to binary data (byte[]) using the encoding (of the bytes);
  • cannot hold any binary data, if the bytes are not well-formed
  • may mix several scripts Latin/Cyrillic/Arabic/symbols.

So:

byte[] b = s.getBytes(StandardCharsets.UTF_8);
s = new String(b, StandardCharsets.UTF_8);

Without the charset parameter the default encoding is used, platform dependent. The conversion will possibly substitute placeholders for non-representable chars, or the binary data may be totally malformed.

Text (String/char) are totally separate from binary data (byte). Also not that char is 2 bytes UTF-16BE, whereas byte is 1 byte.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks for explaining, I now understand my mistake. I changed my code and tried doing what you said, but I'm still getting "false" (so the two strings are still different) – Husayn Hakeem Jul 16 '15 at 15:02