You can solve this by noticing that the comment must be preceded by a sequence of zero or more "units", where you define a unit as:
- a single character other than
"
, or
- a string literal, which is
"
followed by zero or more non-quote characters followed by "
.
So it should work to make the pattern
"^([^\"]|\"[^\"]*\")*?((/\\*([^\\*]|(\\*(?!/))+)*+\\*+/)|(//.*))"
What I've done is preceded your pattern with
^([^"]|"[^"]*")*?
(and, of course, I had to escape the "
characters). This means the string begins with 0 or more "units" as I've defined them above. The last *?
means that we match the smallest possible number of units, so that we find the first comment that follows one of the units.
The first ^
is necessary to anchor the pattern to the beginning of the string, to make sure the matcher doesn't try to start the match inside a string literal. I believe you could use \\G
instead of ^
, since \\G
means "the start of the input". That would work better if you're trying to repeat the pattern match and find all comments in a string.
NOTE: I've tested this, and it seems to work.
NOTE 2: The resulting regex is extremely ugly. It's very popular on StackOverflow to think that a regex can solve every possible problem including finding a cure for cancer; but when the result is as unreadable as this, it's time to start asking whether it wouldn't be simpler, more readable, and more reliable to use something boring like a loop. I don't think regexes are any more efficient, either, although I haven't checked it out.