2

I found one tricky place and couldn't find any answer why this exactly happen.

The main problem is how long is string.

Whether it contains one or two character.

Code:

public class App {
    public static void main(String[] args) throws Exception {
        char ch0 = 55378;
        char ch1 = 56816;
        String str = new String(new char[]{ch0, ch1});
        System.out.println(str);
        System.out.println(str.length());
        System.out.println(str.codePointCount(0, 2));
        System.out.println(str.charAt(0));
        System.out.println(str.charAt(1));
    }
}

Output:

?
2
1
?
?

Any suggestions?

catch23
  • 17,519
  • 42
  • 144
  • 217

1 Answers1

2

Whether it contains one or two character.

It contains one Unicode character, which is comprised of 2 UTF-16 code units. Every char in Java is a UTF-16 code unit... it may not be a whole character. Each character has a single code point - Unicode provides a coded character set mapping each character to an integer representing that character (the code point).

length() returns the number of code units, whereas codePointCount returns the number of code points.

You may want to look at my article about encodings in .NET - the terminology all translates fine (as it's standard terminology), so just ignore the .NET-specific parts.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194