Java regex - comment detector inside string

Question

How to comment detection with regex, but it should doesn't work if inside string.
for e.g :

//----------------------example-----------------------------------------
class fo{
    void foo(){
        /***print comment
        */
        System.out.println("example writing comment // this is comment");
        System.out.println("example comment 1 /* comment1 */");
        System.out.println("example comment 2 /* comment2 "+
                           "*/");
    }
}

Here my pattern of comment detection :

Pattern.compile("^([^\"]|\"[^\"]*\")*?((/\\*([^\\*]|(\\*(?!/))+)*+\\*+/)|(//.*))");

but it doesn't work

so, it sohould // this is comment, /* comment1 */, /* comment2 "+ "*/" must not matched.

See [Java - Regex - Remove comments](http://stackoverflow.com/questions/28411032/java-regex-remove-comments). — , Jan 05 '16 at 07:42

ajb · Answer 1 · 2016-01-05T07:49:32.583

You can solve this by noticing that the comment must be preceded by a sequence of zero or more "units", where you define a unit as:

a single character other than ", or
a string literal, which is " followed by zero or more non-quote characters followed by ".

So it should work to make the pattern

"^([^\"]|\"[^\"]*\")*?((/\\*([^\\*]|(\\*(?!/))+)*+\\*+/)|(//.*))"

What I've done is preceded your pattern with

^([^"]|"[^"]*")*?

(and, of course, I had to escape the " characters). This means the string begins with 0 or more "units" as I've defined them above. The last *? means that we match the smallest possible number of units, so that we find the first comment that follows one of the units.

The first ^ is necessary to anchor the pattern to the beginning of the string, to make sure the matcher doesn't try to start the match inside a string literal. I believe you could use \\G instead of ^, since \\G means "the start of the input". That would work better if you're trying to repeat the pattern match and find all comments in a string.

NOTE: I've tested this, and it seems to work.

NOTE 2: The resulting regex is extremely ugly. It's very popular on StackOverflow to think that a regex can solve every possible problem including finding a cure for cancer; but when the result is as unreadable as this, it's time to start asking whether it wouldn't be simpler, more readable, and more reliable to use something boring like a loop. I don't think regexes are any more efficient, either, although I haven't checked it out.

It worked the way I expected it to. If it's not working for you, please edit your question and include the new code you're using, the source string, what you expected the output to be, and what the output actually was. (Or start a new question.) Include every relevant piece of code. In particular, if you're using `group()` to extract part of the input string, and it's not working as you expect, please show us how you're using it. — ajb, Jan 13 '16 at 04:40

William Callahan · Answer 2 · 2016-01-13T16:06:44.527

-1

The regex expression that you created was missing a few escaped characters, although that may not fit what you are trying to do. Here is your corrected version. Pattern.compile("((\\/\\*([^\\*]|(\\*(?!\\/))+)*\\+\\*\\+\\/)|(\\/\\/.*))");

However, if you are looking to use a replace regex expression in the IDE, use \".*?(\/\/.*?)\" and replace the group $1 with an empty string.

If you would like to use Java to replace the string, try the following:

Pattern p = Pattern.compile("(.*?)(\\/\\/.*?)");
String output = "";
String input = "example writing comment // this is comment";
Matcher m = p.matcher(input);
if (m.find())
    output = m.replaceFirst("$1");

Edit

According to your new question, I have provided the following answer. However, you question is still unclear.

Pattern p = Pattern.compile("^((.*?)((\\/\\/.*?)|(\\/\\*(.*)\\*\\/)(.*))?)$");
String output = "";
String input = "example writing comment // this is comment";
Matcher m = p.matcher(input);
if (m.find())
    output = m.replaceAll("$2$7");

This example will replace strings as follows:

"example writing comment // this is comment"
- example writing comment
"example comment 1 /* comment1 */"
- example comment 1
"example comment 2 /* comment2 "
- example comment 2 /* comment2

edited Jan 13 '16 at 16:06

answered Jan 05 '16 at 07:14

William Callahan

630
7
20

Your regex doesn't answer the question. The OP wants to be able to do a match that ignores comments inside quoted strings. And I don't see any quote marks in your regex, therefore your regex is not doing anything special with quoted strings. – ajb Jan 05 '16 at 07:24
1

Furthermore, your corrections are misguided. Forward slashes have no special meaning in a regex, so escaping them is pointless. The plus signs you escaped are being used as possessive modifiers for the quantifiers preceding them, so escaping them broke the regex. – Alan Moore Jan 05 '16 at 07:29
First, the question did not define the exact context in how the OP would like to use the regex expression. Second, the "+" which I escaped were correct for the context in which I changed them to; the OP had a misguided use of them. Third, the forward slashes do have meaning in some languages. For example, JavaScript exclusively uses forward slashes to define a regex expression. It is good practice to escape them. However, I did not test anything in my IDE because I was away from the computer that had IntelliJ or Eclipse. Also, I rarely write regex for Java. – William Callahan Jan 05 '16 at 07:44
I have never heard of it being a "good practice" to escape forward slashes in Java regexes. – ajb Jan 05 '16 at 07:49
i have update my post, i hope that example was enough – newbie Jan 13 '16 at 06:56

Java regex - comment detector inside string

2 Answers2