53

I have a problem with the replaceAll for a multiline string:

String regex = "\\s*/\\*.*\\*/";
String testWorks = " /** this should be replaced **/ just text";
String testIllegal = " /** this should be replaced \n **/ just text";

testWorks.replaceAll(regex, "x"); 
testIllegal.replaceAll(regex, "x"); 

The above works for testWorks, but not for testIllegal!? Why is that and how can I overcome this? I need to replace something like a comment /* ... */ that spans multiple lines.

mtk
  • 13,221
  • 16
  • 72
  • 112
Robert
  • 565
  • 1
  • 5
  • 6
  • And what about this string: `"String s = \"/*\"; /* comment */"` – Bart Kiers Nov 11 '10 at 12:28
  • Well the point is that the mathing regex should match only in the beginning of the string. Now it looks like this:(?s)^\\s*/\\*.*\\*/ Not sure though, if to make it reluctant (?s)^\\s*/\\*.*?\\*/ – Robert Nov 11 '10 at 12:41

3 Answers3

95

You need to use the Pattern.DOTALL flag to say that the dot should match newlines. e.g.

Pattern.compile(regex, Pattern.DOTALL).matcher(testIllegal).replaceAll("x")

or alternatively specify the flag in the pattern using (?s) e.g.

String regex = "(?s)\\s*/\\*.*\\*/";
rogerdpack
  • 62,887
  • 36
  • 269
  • 388
mikej
  • 65,295
  • 17
  • 152
  • 131
  • 1
    This is the best solution because it does not interact with the regex string itself, you just specify a flag. I did not know that, Thanks! – Robert Nov 11 '10 at 12:31
  • 1
    If you have multiple "multi-line" comments, this method will remove text between those comments as well. Use the method posted by Boris instead. – lepe Nov 29 '11 at 03:58
16

Add Pattern.DOTALL to the compile, or (?s) to the pattern.

This would work

String regex = "(?s)\\s*/\\*.*\\*/";

See Match multiline text using regular expression

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
tchrist
  • 78,834
  • 30
  • 123
  • 180
7

The meta character . matches any character other than newline. That is why your regex does not work for multi line case.

To fix this replace . with [\d\D] that matches any character including newline.

Code In Action

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 1
    Swapping in `[\d\D]` for `.` (which normally means `[^\n]`, at least in `Pattern.UNIX_LINES` mode) strikes me as inappropriate because it is not obvious what it is doing, because it is 6 chars for 1, and because there are other ways of doing this. – tchrist Nov 11 '10 at 12:25