Regex expression to capture '*' while excluding comment section

Question

I want to capture all the * in the code except the comment section. The regex for comment section is: (\/\*\[\S\s\]*?\*\/) I tried excluding the the comment section and search for any * character preceded/succeeded by 0 or more spaces.

Regex : [^\/\*[\S\s]*?\*\/\]\s*\*\s*


/**
 * This function is useless
 *
 * @return      sth
 */
public void testMe() {
    int x = 5*4;
    x*=7;
    x = x**2;
}

It should match all the * inside testMe.

I guess you could do a quick search for `/*` and its corresponding `*/` and then filter out stars within those regions. — Mateen Ulhaq, May 14 '18 at 05:07
Seriously, use a proper parser, or no one will read your code. — user202729, May 14 '18 at 05:07
@MateenUlhaq Which doesn't work for `"this is a string /* ******* */ "`. — user202729, May 14 '18 at 05:08
@user202729 At the risk of going down a rabbit hole... filter out all string literals too. :P Hopefully the code OP is trying to parse isn't any more convoluted than that. — Mateen Ulhaq, May 14 '18 at 05:10
@MateenUlhaq Which (again) probably doesn't work for `" string \" /* fake comment ********* */ \" string "`. — user202729, May 14 '18 at 05:11
Maybe filter out the strings using a simple LL parser rather than a regex? But at this point we should probably just give up and use some library to construct an AST for us. ¯\\\_(ツ)\_/¯ — Mateen Ulhaq, May 14 '18 at 05:14

score 2 · Answer 1 · answered May 14 '18 at 05:16

2

This can be solved using the *SKIP what's to avoid schema using capture groups, i.e. What_I_want_to_avoid|(What_I_want_to_match):

\/\*[\S\s]*?\*\/|(\*+)

The idea here is to completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we only need to check capture group $1, which, when set, contains the asterisks outside of comments.

Demo

answered May 14 '18 at 05:16

wp78de

18,207
7
43
71

Still fails for `"/* fake comment *** */"`. – user202729 May 14 '18 at 05:24
@triandicAnt Because it [doesn't match](https://regex101.com/r/19lSBU/2). – user202729 May 14 '18 at 05:33
@user202729 I am with you, we are entering a deep dark rabbit hole. If we also have to take quoted strings into account, then we also have to allow escaping quotes enclosed in quoted strings, and check properly enclosed string, etc. There are expressions that can do help with that (e.g. [`(['"])(?:(?!\1|\\).|\\.)*\1`](https://stackoverflow.com/a/50320848/8291949), but in the end, there will be always something else, and it breaks. – wp78de May 14 '18 at 08:09
@triandicAnt If you like to search for * in strings, you can capture strings in another group and process them in a second step, see the updated [sample](https://regex101.com/r/LntwfD/2) - this pattern requires the alternative [regex](https://pypi.org/project/regex/) package. I hope, this helps. – wp78de May 14 '18 at 08:28

Regex expression to capture '*' while excluding comment section

1 Answers1