0

I want to capture all the * in the code except the comment section. The regex for comment section is: (\/\*\[\S\s\]*?\*\/) I tried excluding the the comment section and search for any * character preceded/succeeded by 0 or more spaces.

Regex : [^\/\*[\S\s]*?\*\/\]\s*\*\s*


/**
 * This function is useless
 *
 * @return      sth
 */
public void testMe() {
    int x = 5*4;
    x*=7;
    x = x**2;
}

It should match all the * inside testMe.

triandicAnt
  • 1,328
  • 2
  • 15
  • 40

1 Answers1

2

This can be solved using the *SKIP what's to avoid schema using capture groups, i.e. What_I_want_to_avoid|(What_I_want_to_match):

\/\*[\S\s]*?\*\/|(\*+)

The idea here is to completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we only need to check capture group $1, which, when set, contains the asterisks outside of comments.

Demo

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Still fails for `"/* fake comment *** */"`. – user202729 May 14 '18 at 05:24
  • @triandicAnt Because it [doesn't match](https://regex101.com/r/19lSBU/2). – user202729 May 14 '18 at 05:33
  • @user202729 I am with you, we are entering a deep dark rabbit hole. If we also have to take quoted strings into account, then we also have to allow escaping quotes enclosed in quoted strings, and check properly enclosed string, etc. There are expressions that can do help with that (e.g. [`(['"])(?:(?!\1|\\).|\\.)*\1`](https://stackoverflow.com/a/50320848/8291949), but in the end, there will be always something else, and it breaks. – wp78de May 14 '18 at 08:09
  • @triandicAnt If you like to search for * in strings, you can capture strings in another group and process them in a second step, see the updated [sample](https://regex101.com/r/LntwfD/2) - this pattern requires the alternative [regex](https://pypi.org/project/regex/) package. I hope, this helps. – wp78de May 14 '18 at 08:28