It's because of how the Matcher
class handles patterns that could match an empty string. The replaceAll
method of String
is defined to work the same way as the replaceAll
method of Matcher
, which works like this:
This method first resets this matcher. It then scans the input
sequence looking for a match of the pattern. Characters that are not
part of the match are appended directly to the result string; the
match is replaced in the result by the replacement string. The
replacement string may contain references to captured subsequences as
in the appendReplacement method.
When the matcher tries to find a pattern, if the subsequence in the source is the empty string, the matcher returns the empty string but then bumps up the current index by 1, so that it does not return an infinite loop of empty strings. So here's how it operates on "Hello"
:
1) The matcher looks for .*
. Since this is a greedy match, matching as many characters as possible, it will find the substring "Hello"
, and uses that, replacing it with "US"
. The current index is then positioned after the 'o'
.
2) The matcher looks for .*
again. Since it's at the end of the input, but the pattern is allowed to match an empty string, it matches the empty string and replaces that with another "US"
. But then it bumps up the current index, which is now in a position past the end of the source.
3) The matcher looks for .*
again, but since the current index is past the end of the source, it won't find anything.
To get a feel for how it operates, try using ".*?"
as the pattern. Now, the matcher will always use an empty string, because the ?
tells it to use the shortest string possible. It also increases the current index by 1 each time it finds an empty string. The result:
a.replaceAll("(?s).*?", ".-") //returns
".-H.-e.-l.-l.-o.-"
That is, it replaces all the empty strings between each pair of characters with ".-"
, and leaves the actual characters alone.
Moral: Be really careful with patterns that could match empty strings.
MORE: After reading your comment, where you indicate that the pattern could be input by the user, I think you could use this as a test to see if the pattern could match the empty string:
if ("".matches(inputPattern)) {
// ???
}
I'm not sure what you'd do with it. Perhaps it's always the case that if this is true, your replaceAll
will add an extra US
at the end and you can safely delete it. Or maybe you can just tell them to try a different pattern.
PPS. I'm not sure where this behavior of the matcher (i.e. increasing the current index by 1 when the match is an empty string) is documented. I didn't see it in the Matcher
javadoc. I suppose that means that a future version of the JRE could behave differently, although this seems highly unlikely.