java char encoding length should always be x times 2?

Question

java internal encoding for chars are UTF-16 right? While all ASCII uses 2 bytes encoding, then I expect:

     String h="hello"; 
     System.out.println(h.codePointCount(0,h.length())); 
     System.out.println(h.length());

to print 10 and 5, But in fact it prints 5, 5.

Where did I get wrong?

There is the answer for this question https://stackoverflow.com/questions/5078314/isnt-the-size-of-character-in-java-2-bytes — Centos, Nov 20 '18 at 12:16
`codePointCount` basically is a more exact version of `length` that works correctly for surrogate pairs. For ASCII characters (more generally BMP characters) there is no difference. — Henry, Nov 20 '18 at 12:41

score 2 · Accepted Answer · answered Nov 20 '18 at 12:54

2

Try

String h="hell";
System.out.println(h.codePointCount(0,h.length())); 
System.out.println(h.length());

it prints 5, 6.

'' is presented by two code units, each of 'h', 'e', 'l', 'l' - by one.

And about UTF-16: "The encoding is variable-length, as code points are encoded with one or two 16-bit code units..."

answered Nov 20 '18 at 12:54

In case it's not clear, the question arises from confusing Unicode _codepoints_ with UTF-16 _code units_. Codepoints are not encoded. – Tom Blodget Nov 20 '18 at 14:43

1 Answers1