6

Possible Duplicate:
Java regex anomaly?

any Idea why the following test fails (returns "xx" instead of "x")

@Test 
public void testReplaceAll(){
    assertEquals("x", "xyz".replaceAll(".*", "x"));
}

I don't want to do "^.*$".... I want to understand this behavior. any clues?

Community
  • 1
  • 1
ekeren
  • 3,408
  • 3
  • 35
  • 55

2 Answers2

9

Yes, it is exactly the same as described in this question!

.* will first match the whole input, but then also an empty string at the end of the input...

Let's symbolize the regex engine with | and the input with <...> in your example.

  • input: <xyz>;
  • regex engine, before first run: <|xyz>;
  • regex engine, after first run: <xyz|> (matched text: "xyz");
  • regex engine, after second run: <xyz>| (matched text: "").

Not all regex engines behave this way. Java does, however. So does perl. Sed, as a counterexample, will position its cursor after the end of the input in step 3.

Now, you also have to understand one crucial thing: regex engines, when they encounter a zero-length match, always advance one character. Otherwise, consider what would happen if you attempted to replace '^' with 'a': '^' matches a position, therefore is a zero-length match. If the engine didn't advance one character, "x" would be replaced with "ax", which would be replace with "aax", etc. So, after the second match, which is empty, Java's regex engine advances one "character"... Of which there aren't any: end of processing.

Community
  • 1
  • 1
fge
  • 119,121
  • 33
  • 254
  • 329
  • but isn't it suppose to be greedy matching? – ekeren Dec 29 '11 at 19:51
  • Well, it _is_ greedy, isn't it? Remember that the `*` means "**ZERO** or more". So, `.*` is perfectly satisfied with an empty string! – fge Dec 29 '11 at 19:55
  • After the edit I understand -- java iterates over it's findings... and when it reaches the last char it also cause the match (the empty char match...) Thanks!!! – ekeren Dec 29 '11 at 19:56
  • See edit: there is a very important thing to understand as well with this example. – fge Dec 29 '11 at 20:05
0
@Test 
public void testReplaceAll(){
    assertEquals("x", "xyz".replaceAll(".+", "x"));
}

Would probably do the trick since that requires one or more characters and so does prevent the behaviour in which * might match zero characters and replace it with "x".

Willem Mulder
  • 12,974
  • 3
  • 37
  • 62