I just thought of a good way of showing what happens by simply replacing the new String(byte[])
method by another one, which is why I will answer the question. This one performs the same basic action as the constructor, with one change: it throws an exception if any invalid characters are found.
private static final byte[] test_key = {-112, -57, -45, 125, 91, 126, -118, 13, 83, -60, -119, 57, 38, 118, -115, -52, -92, 39, -24, 75, 59, -21, 88, 84, 66, -125};
public static void main(String[] args) throws Exception {
byte[] encryptedArray = xor("ciao".getBytes(), test_key);
System.out.println("Encrypted arrray: " + Arrays.toString(encryptedArray));
final String encrypted = new String(encryptedArray);
// original
System.out.println("Length: " + new String(encryptedArray).length());
// replacement
System.out.println("Length: " + decode(encryptedArray).length());
System.out.println(Arrays.toString(encrypted.getBytes()));
System.out.println("Encrypted value: " + encrypted);
System.out.println("Decrypted value: " + new String(xor(encrypted.getBytes(), test_key)));
}
private static String decode(byte[] encryptedArray) throws CharacterCodingException {
var decoder = Charset.defaultCharset().newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
var decoded = decoder.decode(ByteBuffer.wrap(encryptedArray));
return decoded.toString();
}
private static byte[] xor(byte[] data, byte[] key) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.length; i++) {
result[i] = (byte) (data[i] ^ key[i % key.length]);
}
return result;
}
The method is called decode
because that's what you are actually doing: you are decoding the bytes to a text. A character encoding is the encoding of characters as bytes, which means that the opposite must be decoding after all.
As you will see, the above will first print out 2
if your platform uses the default UTF-8 encoding (Linux, Android, MacOS). You can get the same result by replacing Charset.defaultCharset()
with StandardCharsets.UTF_8
on Windows which uses the Windows-1252 charset instead (a single byte encoding which is an expansion of Latin-1, which itself is an expansion of ASCII). However, it will generate the following exception if you use the decode
method:
java.nio.charset.MalformedInputException: Input length = 3
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:815)
at StackExchange/com.stackexchange.so.ShowBadEncoding.decode(ShowBadEncoding.java:36)
at StackExchange/com.stackexchange.so.ShowBadEncoding.main(ShowBadEncoding.java:24)
Now maybe you'd expect 4 here, the size of the byte array. But note that UTF-8 characters may be encoded over multiple bytes. The error occurs not on the entire string, but on the last character it is trying to read. Obviously it is expecting a longer encoding based on the previous byte values.
If you replace REPORT
with the default decoding action REPLACE
(heh) you will see that the result is identical to the constructor, and length()
will now return the value 2 again.
Of course, Topaco is correct when he says you need to use base 64 encoding. This encodes bytes to characters instead so that all of the meaning of the bytes is maintained, and the reverse is of course the decoding of text back to bytes.