I have to agree with the comments that for most cases it is better to use a parser, even when a RegExp
can do the job for a specific and well defined use case.
The problem is not that you can't make it work for that very specific use case even thought there are probably plenty of edge cases that you don't really care about, nor have to, but that may break that solution. The actual problem is that if you start building around your sub-optimal solution and your requirements evolve overtime, you will start to patch those as they appear. Someday, you may find yourself with an extensive codebase full of patches that doesn't scale anymore and the only solution will probably be to start from scratch.
Anyway, you have been warned by a few of us, and is still possible that your use case is really that simple and will not change in the future. I would still consider moving from RegExp
to a parser at some point, but maybe you can use this meanwhile:
(^ +\/\/(.*))|(["'`]+.*["'`]+.*\/\/(.*))|(["'`]+.*["'`]+.*\/\*([\W\w\n\r]+?)\*\/)|(^ +\/\*([\W\w\n\r]+?)\*\/)
Just in case, I have added a few other cases, such as comments that come straight after some valid code:

Edit to prove the first point and what is being said in the comments:
I have just answered this with the previous RegExp
that was solving just the issue that you pointed out in your question (your RegExp
was misinterpreting strings containing glob patterns as code comments).
So, I fixed that and I even made it able to match comments that start in the same line as a valid (non-commented) statement. Just a moment after posting that I notice that this last feature will only work if that statement contains a string.
This is the updated version, but please, keep in mind that this is exactly what we are warning you about...:
(^[^"'`\n]+\/\/(.*))|(["'`]+.*["'`]+.*\/\/(.*))|(["'`]+.*["'`]+.*\/\*([\W\w\n\r]+?)\*\/)|(^[^"'`\n]+\/\*([\W\w\n\r]+?)\*\/)

How does it work?
There are 4 main groups that compose the whole RegExp
, the first two for single-line comments and the next two for multi-line comments:
(^[^"'`\n]+//(.*))
(["']+.*["'
]+.//(.))
(["']+.*["'
]+.*/*([\W\w\n\r]+?)*/)
(^[^"'`\n]+/*([\W\w\n\r]+?)*/)
You will see there are some repeated patterns:
^[^"'`\n]+
: From the start of a line, match anything that doesn't include any kind of quote or line break.
`
is for ES2015 template literals.
Line breaks are excluded as well to prevent matching empty lines.
Note the +
will prevent matching comments that are not padded with at least one space. You can try replacing it with *
, but then it will match strings containing glob patterns again.
["']+.*["'
]+.*
: This is matching anything that is between quotes, including anything that looks like a comment but it's part of a string. Whatever you match after, it will be outside that string, so using another group you can match comments.