5

I tried to convert byte[] to string as follows:

Map<String, String> biomap = new HashMap<String, String>();
biomap.put("L1", new String(Lf1, "ISO-8859-1"));

where Lf1 is byte[] array and then i convert this string to byte[]: problem is, when i convert byte array to string it comes like:

FMR  F P�d@� �0d@r (@� ......... etc

and

String SF1 = biomap.get("L1");
byte[] storedL1 = SF1.getBytes("ISO-8859-1")

and when i convert back it to byte array and compare both arrays, it return false. I mean Data Changed.

i want same byte[] data as it was when i encoded to string and decodec to byte[]

Rudy Velthuis
  • 28,387
  • 5
  • 46
  • 94

2 Answers2

10

First: ISO-8859-1 does not cause any data loss if an arbitrary byte array is converted to string using this encoding. Consider the following program:

public class BytesToString {
    public static void main(String[] args) throws Exception {
        // array that will contain all the possible byte values
        byte[] bytes = new byte[256];
        for (int i = 0; i < 256; i++) {
            bytes[i] = (byte) (i + Byte.MIN_VALUE);
        }

        // converting to string and back to bytes
        String str = new String(bytes, "ISO-8859-1");
        byte[] newBytes = str.getBytes("ISO-8859-1");

        if (newBytes.length != 256) {
            throw new IllegalStateException("Wrong length");
        }
        boolean mismatchFound = false;
        for (int i = 0; i < 256; i++) {
            if (newBytes[i] != bytes[i]) {
                System.out.println("Mismatch: " + bytes[i] + "->" + newBytes[i]);
                mismatchFound = true;
            }
        }
        System.out.println("Whether a mismatch was found: " + mismatchFound);
    }
}

It builds an array of bytes with all possible byte values, then it converts it to String using ISO-8859-1 and then back to bytes using the same encoding.

This program outputs Whether a mismatch was found: false, so bytes->String->bytes conversion via ISO-8859-1 yields the same data as it was in the beginning.

But, as it was pointed out in the comments, String is not a good container for binary data. Specifically, such a string will almost surely contain unprintable characters, so if you print it or try to pass it via HTML or some other means, you will get some problems (data loss, for example).

If you really need to convert byte array to a string (and use it opaquely), use base64 encoding:

String stringRepresentation = Base64.getEncoder().encodeToString(bytes);
byte[] decodedBytes = Base64.getDecoder().decode(stringRepresentation);

It takes more space, but the resulting string is safe in regard to printing.

Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72
  • 1
    that's why i was using ISO-8859-1 encoding but unfortunately output was not what i desired. anyway now it is working. thanks for quick reply. +1 – Kumar Gaurav Sharma Jun 17 '17 at 10:27
  • Is there anyway to short the length of String which is generated from byte[]? – Vishal Senjaliya Jan 05 '19 at 07:05
  • @VishalSenjaliya you could you base64 or another baseXX variant that suits your data – Roman Puchkovskiy Jan 15 '19 at 12:48
  • @RomanPuchkovskiy base64 would increase the length of his data by factor of roughly 4/3, in terms of number of characters, compared to what he's using above. Since Java uses UTF-16 as the string encoding it may be possible to encode ~1.5 bytes per character or so with some whackadoodle encoding, but in terms of memory footprint you will never have a String encoding of a byte sequence that takes less memory than the byte sequence itself. – Jason Carlson Mar 05 '19 at 18:23
2

There are special encodings like base64 for encoding binary data for text only systems.

Converting a byte[] to String is only guaranteed to work, if the byte[] contains a valid sequence of bytes according to the chosen encoding. Unknown byte sequences might be replaced with the unicode replacement character (as shown in your example).

ooxi
  • 3,159
  • 2
  • 28
  • 41