For any given Java String s
, I would like to know if the array of characters represented by s
is guaranteed to be a valid UTF-16 string, e.g.:
final char[] ch = new char[s.length()];
for (int i = 0; i < ch.length; ++i) {
ch[i] = s.charAt(i);
}
// Is ch guaranteed to be a valid UTF-16 encoded string?
If not, what are some simple Java-language test cases that produce invalid UTF-16?
EDIT: Somebody has flagged the question as a possible duplicate of [Is a Java char array always a valid UTF-16 (Big Endian) encoding? All I can say is, there's a difference between a String
and a char[]
and a reason why the former might, at least theoretically, have guarantees as to its contents that the latter does not. I'm not asking a question about arrays, I'm asking a question about String
s.