A Java String
is in UTF-16 format:
A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.
UTF-16 is:
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.
UTF-8 is:
UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes.
It follows that every character that can appear in a java String
can be encoded in UTF-8
.
Therefore
StandardCharsets.UTF_8.newEncoder().canEncode(input);
should always return true.