I have a small piece of code in which I am checking the codepoint for the the character Ü
.
Locale lc = Locale.getDefault();
System.out.println(lc.toString());
System.out.println(Charset.defaultCharset());
System.out.println(System.getProperty("file.encoding"));
String inUnicode = "\u00dc";
String glyph = "Ü";
System.out.println("inUnicode " + inUnicode + " code point " + inUnicode.codePointAt(0));
System.out.println("glyph " + glyph + " code point " + glyph.codePointAt(0));
I am getting different value for codepoint when I run this code on MacOS x and Windows 10, see the output below.
Output on MacOS
en_US
UTF-8
UTF-8
inUnicode Ü code point 220
glyph Ü code point 220
Output on Windows
en_US
windows-1252
Cp1252
in unicode Ü code point 220
glyph ?? code point 195
I checked the codepage for windows-1252 at https://en.wikipedia.org/wiki/Windows-1252#Character_set, here the codepoint for Ü
is 220
.
For String glyph = "Ü";
why do I get codepoint as 195
on Windows? As per my understanding glyph
should have been rendered properly and the codepoint should have been 220
since it is defined in Windows-1252.
If I replace String glyph = "Ü";
with String glyph = new String("Ü".getBytes(), Charset.forName("UTF-8"));
then glyph
is rendered correctly and codepoint value is 220
.
Is this the correct and efficient way to standardize behavior of String on any OS irrespective of locale and charset?