How does the StringTokenizer
class identify them as separate characters?
There is a method in String
called charAt
and codePointAt
, which returns the character or code point at an index:
"abc".charAt(0) // 'a'
The StringTokenizer
's implementation will use it both of these methods on the delimiters passed in at some point. In my version of the JDK, the code points of the delimiters string are extracted and added to an array delimiterCodePoints
in a method called setMaxDelimCodePoint
, which is called by the constructor:
private void setMaxDelimCodePoint() {
// ...
if (hasSurrogates) {
delimiterCodePoints = new int[count];
for (int i = 0, j = 0; i < count; i++, j += Character.charCount(c)) {
c = delimiters.codePointAt(j); <--- notice this line
delimiterCodePoints[i] = c;
}
}
}
And then this array is accessed in the isDelimiter
method, which decides whether a character is a delimiter:
private boolean isDelimiter(int codePoint) {
for (int i = 0; i < delimiterCodePoints.length; i++) {
if (delimiterCodePoints[i] == codePoint) {
return true;
}
}
return false;
}
Of course, this is not the only way that the API could be designed. The constructor could have accepted an array of char
as delimiters instead, but I am not qualified to say why the designers did it this way.
Why is ".?!" not treated as a single delimiter?
StringTokenizer
only supports single character delimiters. If you want a string as a delimiter, you can use Scanner
or String.split
instead. For both of these, the delimiter is represented as a regular expression, so you have to use "\\.\\?!"
instead. You can learn more about regular expressions here