8

Here is the code that’s causing the confusion:

String s = "one\ntwo\nthree\n";
s = s.replaceAll("^", "START");
s = s.replaceAll("$", "END");
System.out.print(s);

Here is the output:

STARTone
two
threeEND
END

I know that the dollar sign matches the end of the line in Java, but I thought it would act the same even when the string ends with \n. Why was “END” printed twice?

sp00m
  • 47,968
  • 31
  • 142
  • 252
tal
  • 105
  • 6

1 Answers1

1

Java Regex consider input as single line by default

Refer to Line Terminator section

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.

Since we did not enable MULTILINE mode, only end of input should be matched and \n should not be matched in this case. So it is probably a bug as mentioned by RotoRa.

One possible hint about this bug may come from the comment of Pattern#Dollar, the internal class to handle $

When not in multiline mode, the $ can only match at the very end of the input, unless the input ends in a line terminator in which it matches right before the last line terminator.

"one\ntwo\nthree\n" with \n before " fulfils the case unless the input ends in a line terminator in which it matches right before the last line terminator

Solution to match \n

If you want to match \n with $ properly, you can enable MULTILINE mode using flag expression (?m) like s = s.replaceAll("(?m)^", "START").replaceAll("(?m)$", "END");. You will see

STARToneEND
STARTtwoEND
STARTthreeEND
END

this time. Please note that there are still two END in the end as end of input sequence is also matched.

samabcde
  • 6,988
  • 2
  • 25
  • 41