In Java, I've been trying to write a String to a file using UTF-8 encoding which will later be read by another program written in a different programming language. While doing so I noticed that the bytes created when encoding a String into a byte array didn't seem to have the correct byte values.
I narrowed down the problem to the symbol "£" which seems to produce incorrect bytes when encoded to UTF-8
byte[] byteArray = "£".getBytes(Charset.forName("UTF-8"));
// Print out the Byte Array of the UTF-8 converted string
// Upcast byte values to print the bytes as unsigned
for (byte signedByte : byteArray) {
System.out.print((signedByte & 0xFF) + " ");
}
This outputs 6 bytes with the decimal values: 239 190 130 239 189 163, in hex this is: ef be 82 ef bd a3
http://www.utf8-chartable.de/ however says that the values for "£" in hex is: c2 a3, the output should then be: 194 163
Other strings seem to produce correct bytes when encoded as UTF-8, so I'm wondering why Java is producing these 6 bytes for "£", and how I should go about properly converting by Strings to byte arrays using UTF-8 encoding
I have also tried
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8");
out.write("£");
out.close();
but this produced the same 6 bytes