I am writing a lexer in which I want to remove comments such as:
/* this is a block comment that can
span across multiple lines */
// this a line comment that can only span one line
I have attempted this already with /\*.*\*/
which matches block comments. The problem with this is that RegEx's first match always seems to be the longest. If I have multiple block comments in the same code, it will match the substring from the start of the first block comment to the end of the last block comment. This is an issue I would like to fix. I assume I can (after the .*
) add in something that checks that there isn't a */
in the comment itself. However, I do not know how to tell RegEx (Java) to not match a specific word in the substring.
How can I edit /\*.*\*/
to stop String.replaceAll() matching across several comments?
(I can use the same solution for the line comments for //
and \n
)