I read in a comment to an answer by @Esailija to a question of mine that
ISO-8859-1 is the only encoding to fully retain the original binary data, with exact byte<->codepoint matches
I also read in this answer by @AaronDigulla that :
In Java, ISO-8859-1 (a.k.a ISO-Latin1) is a 1:1 mapping
I need some insight on this. This will fail (as illustrated here) :
// \u00F6 is ö
System.out.println(Arrays.toString("\u00F6".getBytes("utf-8")));
// prints [-61, -74]
System.out.println(Arrays.toString("\u00F6".getBytes("ISO-8859-1")));
// prints [-10]
Questions
- I admit I do not quite get it - why does it not get the bytes in the code above ?
- Most importantly, where is this (byte preserving behavior of
ISO-8859-1
) specified - links to source, or JSL would be nice. Is it the only encoding with this property ? - Is it related to
ISO-8859-1
being the default default ?
See also this question for nice counter examples from other charsets.