0

I've made a regex expression to capture code comments which seems to be working except in the case when the comments contains * [anynumber of characters inbetween] /, e.g.:

/* these are some comments =412414515/ * somecharacters /][;';'] */

Regex: (\/\*[^*]*[^/]*\*\/)

https://regex101.com/r/xmpTzw/2

doctopus
  • 5,349
  • 8
  • 53
  • 105

2 Answers2

4
\/\*[\s\S]*?\*\/

Just use a lazy operator instead of trying to not match *

David Hayes
  • 196
  • 8
  • The OP is correct, that in general a negated character class is preferred to using the lazy modifier on quantifier because that prevents backtracking (potentially catastrophic backtracking if the file is big enough and the match will timeout). However, it is considered valid for the * literal to be within a comment, so you can't use the negated class [^*] to match the guts of the comment. – viggity Dec 06 '17 at 03:12
2

For a start, I suggest this pattern:

(\/\*[\S\s]*?\*\/)

Demo

const regex = /(\/\*[\S\s]*?\*\/)/g;
const str = `This is/ some code /* these are some comments
=412414515/  * somechars /  ][;';'] */*/
Some more code 
/* and some more unreadable comments a[dpas[;[];135///]] 
d0gewt0qkgekg;l''\\////
*/ god i hate regex  /* asda*asd
\\asd*sd */`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • One thing I want to point to is the second line: The multiline comment is terminated by the first \* and the 2nd is not part of the comment - just like in a normal c-style parser. – wp78de Dec 06 '17 at 03:04
  • Solution works, but I got a question because I still don't really understand it. `[\S\s]*` matches any non-whitespace and any whitespace character zero or or more times. Isn't that the same as `.*`? – doctopus Dec 06 '17 at 03:17
  • `.` doesn't match newlines – David Hayes Dec 06 '17 at 03:36