0

I have to match text between /* and */. These are java block comments.

For now I created program that doesn't work as intended because it doesn't stop at closing token */. Here is code:

public static void main(String[] args) {
String s = "public void /* sdksd\n*k/sss\\d\nsd */ main class\n/*String s = null;*/trtgg";
        Matcher matcher = Pattern.compile("(?s)/\\*(.*)(?=\\*/)").matcher(s);
        while (matcher.find()) {
            String group = matcher.group(1);
            System.out.println("group="+group);
        }
}

It prints:

group= sdksd
*k/sss\d
sd */ main class
/*String s = null;

Expected output is :

group= sdksd
*k/sss\d
sd 
group=String s = null;

Why it doesn't stop at first closing token */?

Is there other way to achieve this?

Jay Smith
  • 2,331
  • 3
  • 16
  • 27
  • 4
    `.*` is greedy so you need to use `.*?` to make it lazy. Comment extraction can be pretty tricky though. – anubhava May 10 '17 at 15:11
  • Use [this solution](http://stackoverflow.com/a/36328890/3832970). More than 4 times as few steps required to complete a match than the best one suggested by Jiri. – Wiktor Stribiżew May 10 '17 at 15:51

2 Answers2

5

Assuming you don't allow nested comments, like /* outer comment /* nested comment */ outer comment */.

In Java, you can use non-greedy matching:

/\\*.*?\\*/

Or to avoid non-greedy matching (some regexes don't have it):

/\\*([^*]|\\*[^/])*\\*/

If you allow nested comments, the resulting language is non-regular and cannot be described by a true regular expression (it still might be possible even with Java's regular expression since it isn't strictly regular).

Jiri Tousek
  • 12,211
  • 5
  • 29
  • 43
0

Indeed to make regex engine match (?=\\*/) after single character is consumed by .* you need to make * as lazy . Here is correct code:

Matcher matcher = Pattern.compile("(?s)/\\*(.*?)(?=\\*/)").matcher(s);
Jay Smith
  • 2,331
  • 3
  • 16
  • 27