I like challenges :)
Here's my working solution:
/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm
Replace that with $1
.
Fiddle here: http://jsfiddle.net/LucasTrz/DtGq8/6/
Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...
NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.
Breakdown
((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$ <-- line comments
|\/\*[\s\S]*?\*\/ <-- inline comments
The part to keep
(["'])(?:\\[\s\S]|.)*?\2 <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/ <-- regex literals
Strings
["'] match a quote and capture it
(?:\\[\s\S]|.)*? match escaped characters or unescpaed characters, don't capture
\2 match the same type of quote as the one that opened the string
Regex literals
\/ match a forward slash
(?![*\/]) ... not followed by a * or / (that would start a comment)
(?:\\.|\[(?:\\.|.)\]|.)*? match any sequence of escaped/unescaped text, or a regex character class
\/ ... until the closing slash
The part to remove
|\/\/.*?$ <-- line comments
|\/\*[\s\S]*?\*\/ <-- inline comments
Line comments
\/\/ match two forward slashes
.*?$ then everything until the end of the line
Inline comments
\/\* match /*
[\s\S]*? then as few as possible of anything, see note below
\*\/ match */
I had to use [\s\S]
instead of .
because unfortunately JavaScript doesn't support the regex s
modifier (singleline - this one allows .
to match newlines as well)
This regex will work in the following corner cases:
- Regex patterns containing
/
in character classes: /[/]/
- Escaped newlines in string literals
Final boss fight
And just for the fun of it... here's the eye-bleeding hardcore version:
/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm
This adds the following twisted edge case (fiddle, regex101):
Code = /* Comment */ /Code regex/g ; // Comment
Code = Code / Code /* Comment */ /g ; // Comment
Code = /Code regex/g /* Comment */ ; // Comment
This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.