I recently needed to do this very thing (i.e. Remove all comments from a html file). Some things that these other answers don't take into consideration;
- An html file can have css and JS inline, which, well I wanted to strip at least
- Comment syntax while inside a string or regex is totally valid. (My string/regex exclusion pattern is based on: https://stackoverflow.com/a/23667311/3799617)
TLDR: (I just want the regex that removes all the comments, plz)
/\\\/|\/\s*(?:\\\/|[^\/\*\n])+\/|\\"|"(?:\\"|[^"])*"|\\'|'(?:\\'|[^'])*'|\\`|`(?:\\`|[^`])*`|(\/\/[\s\S]*?$|(?:<!--|\/\s*\*)\s*[\s\S]*?\s*(?:-->|\*\s*\/))/gm
And here is a simple demo: https://www.regexr.com/5fjlu
I don't hate reading, show me the rest:
I also needed to do various other matching that took into account valid strings containing things that otherwise appear as valid targets. So I made a class to handle my variety of uses.
class StringAwareRegExp extends RegExp {
static get [Symbol.species]() { return RegExp; }
constructor(regex, flags){
if(regex instanceof RegExp) regex = StringAwareRegExp.prototype.regExpToInnerRegexString(regex);
regex = super(`${StringAwareRegExp.prototype.disqualifyStringsRegExp}(${regex})`, flags);
return regex;
}
stringReplace(sourceString, replaceString = ''){
return sourceString.replace(this, (match, group1) => { return group1 === undefined ? match : replaceString; });
}
}
StringAwareRegExp.prototype.regExpToInnerRegexString = function(regExp){ return regExp.toString().replace(/^\/|\/[gimsuy]*$/g, ''); };
Object.defineProperty(StringAwareRegExp.prototype, 'disqualifyStringsRegExp', {
get: function(){
return StringAwareRegExp.prototype.regExpToInnerRegexString(/\\\/|\/\s*(?:\\\/|[^\/\*\n])+\/|\\"|"(?:\\"|[^"])*"|\\'|'(?:\\'|[^'])*'|\\`|`(?:\\`|[^`])*`|/);
}
});
From this I created two more classes to hone in on the 2 major types of matches I needed:
class CommentRegExp extends StringAwareRegExp {
constructor(regex, flags){
if(regex instanceof RegExp) regex = StringAwareRegExp.prototype.regExpToInnerRegexString(regex);
return super(`\\/\\/${regex}$|(?:<!--|\\/\\s*\\*)\\s*${regex}\\s*(?:-->|\\*\\s*\\/)`, flags);
}
}
class StatementRegExp extends StringAwareRegExp {
constructor(regex, flags){
if(regex instanceof RegExp) regex = StringAwareRegExp.prototype.regExpToInnerRegexString(regex);
return super(`${regex}\\s*;?\\s*?`, flags);
}
}
And finally (however useful it may be to whomever) the regex created from this:
const allCommentsRegex = new CommentRegExp(/[\s\S]*?/, 'gm');
const enableBabelRegex = new CommentRegExp(/enable-?_?\s?babel/, 'gmi');
const disableBabelRegex = new CommentRegExp(/disable-?_?\s?babel/, 'gmi');
const includeRegex = new CommentRegExp(/\s*(?:includes?|imports?|requires?)\s+(.+?)/, 'gm');
const importRegex = new StatementRegExp(/import\s+(?:(?:\w+|{(?:\s*\w\s*,?\s*)+})\s+from)?\s*['"`](.+?)['"`]/, 'gm');
const requireRegex = new StatementRegExp(/(?:var|let|const)\s+(?:(?:\w+|{(?:\s*\w\s*,?\s*)+}))\s*=\s*require\s*\(\s*['"`](.+?)['"`]\s*\)/, 'gm');
const atImportRegex = new StatementRegExp(/@import\s*['"`](.+?)['"`]/, 'gm');
And lastly, if anyone cares to see it in use. Here's the project I used it in (..My personal projects are always a WIP..): https://github.com/fatlard1993/page-compiler