public class UTF8 {
public static void main(String[] args){
String s = "ヨ"; //0xFF6E
System.out.println(s.getBytes().length);//length of the string
System.out.println(s.charAt(0));//first character in the string
}
}
output:
3
ヨ
Please help me understand this. Trying to understand how utf8 encoding works in java. As per java doc definition of char char: The char data type is a single 16-bit Unicode character.
Does it mean char type in java can only support those unicode characters that can be represented with 2 bytes and not more than that?
In the above program, the no of bytes allocated for that string is 3 but in the third line which returns first character( 2 bytes in java) can hold a character which is 3 bytes long? really confused here?
Any good references regarding this concept in java/ general would be really appreciated.