0

I am writing a lexer in which I want to remove comments such as:

/* this is a block comment that can
   span across multiple lines */
// this a line comment that can only span one line

I have attempted this already with /\*.*\*/ which matches block comments. The problem with this is that RegEx's first match always seems to be the longest. If I have multiple block comments in the same code, it will match the substring from the start of the first block comment to the end of the last block comment. This is an issue I would like to fix. I assume I can (after the .*) add in something that checks that there isn't a */ in the comment itself. However, I do not know how to tell RegEx (Java) to not match a specific word in the substring.

How can I edit /\*.*\*/ to stop String.replaceAll() matching across several comments? (I can use the same solution for the line comments for // and \n)

Crazy Redd
  • 435
  • 1
  • 5
  • 18
  • This is called a greedy match: https://stackoverflow.com/questions/11898998/how-can-i-write-a-regex-which-matches-non-greedy – ZbyszekKr Aug 18 '16 at 20:32

1 Answers1

2

You want to make the match non greedy. The ? sign does that:

/\*.*?\*/

The ? after the * tells the search to find the minimum amount of chars to make pattern match the text.

Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
  • But be careful about lines like: `/* Comments end with "*/" and can be multiline*/` (Unless you assume the code will compile successfully.) – FredK Aug 18 '16 at 20:35