ANSWER GIVEN, SEE BELOW -- morale: never calls .split()
alone; if you want sane behaviour, always give it a length argument of -1. But not 0!
The javadoc for Pattern.split()
states the following:
The array returned by this method contains each substring of the input sequence that is terminated by another subsequence that matches this pattern or is terminated by the end of the input sequence.
Witness this code:
private static final Pattern UNDERSCORE = Pattern.compile("_");
public static void main(final String... args)
{
System.out.println(UNDERSCORE.split("_").length);
}
Now, refering to the javadoc, an array should contain substrings of the input which are either (quoting):
- "terminated by another subsequence that matches this pattern": well, there is one -- the empty string right before the underscore (which
UNDERSCORE
obviously matches); - or "is terminated by the end of the input sequence": there is one too: the empty string right after the underscore.
Yet, the above code prints 0
. Why? Is this a known bug? (imnsho yes, see below) What are other cases where .split()
does not obey its contract? (again, see below)
THE ANSWER (right below this explanative text)
When using a Pattern
, the single-argument .split()
method is equivalent to calling the two-arguments method with 0
as an argument.
And this is where the bug lies. With an argument of 0, all empty strings from the end of the array "down to" the first non empty element are removed from the result.
If, prior to reading this, you didn't know what a braindead design decision was, now you know. And it is all the more dangerous that this is the default behaviour.
The solution is to always use the full form of the .split()
method and give it a negative length argument. Here, -1 is chosen. And in this case, .split()
behaves sanely:
private static final Pattern UNDERSCORE = Pattern.compile("_");
public static void main(final String... args)
{
System.out.println(UNDERSCORE.split("_").length);
System.out.println(UNDERSCORE.split("__").length);
System.out.println(UNDERSCORE.split("_x_").length);
System.out.println(UNDERSCORE.split("_", -1).length);
System.out.println(UNDERSCORE.split("__", -1).length);
System.out.println(UNDERSCORE.split("_x_", -1).length);
}
Output:
0 # BUG!
0 # BUG!
2 # BUG!
2 # OK
3 # OK
3 # OK