52

I need to remove all JavaScript comments from a JavaScript source using the JavaScript RegExp object.

What I need is the pattern for the RegExp.

So far, I've found this:

compressed = compressed.replace(/\/\*.+?\*\/|\/\/.*(?=[\n\r])/g, '');

This pattern works OK for:

/* I'm a comment */

or for:

/*
 * I'm a comment aswell
*/

But doesn't seem to work for the inline:

// I'm an inline comment

I'm not quite an expert for RegEx and it's patterns, so I need help.

Also, I' would like to have a RegEx pattern which would remove all those HTML-like comments.

<!-- HTML Comment //--> or <!-- HTML Comment -->

And also those conditional HTML comments, which can be found in various JavaScript sources.

Thanks.

metaforce
  • 1,337
  • 5
  • 17
  • 26
  • 2
    Related question: *[Regular expression for clean javascript comments of type //](http://stackoverflow.com/questions/4278739/regular-expression-for-clean-javascript-comments-of-type)* – Gumbo May 13 '11 at 08:55
  • 2
    This is tricky, since you can have `var str = "/* comment? */"` and the like, which would make you parse JS in some way to get right. – Qtax May 13 '11 at 08:57
  • @Qtax - Its even trickier than that! A correct solution must consider literal regexes as well as strings and comments. Consider the following: `var re = /\/*notacomment!*/;` and `m = /\//.test("notacomment!")` and `var re = /\/*/; // */ thiscommentishandledasascode!` and `var re = /"/; // " thiscommentishandledasascode!` – ridgerunner Aug 14 '13 at 13:54
  • @ridgerunner, that was my point, that you have to "parse" (tokenize) JS. Matching regex literals is only slightly more complicated that matching strings or comments. Not because of escapes, but due to the lack of them. For example `/[///]/`. But you probably need close to a full lexer to figure out that `9 /thisIsNotARegex/ 2`. – Qtax Aug 14 '13 at 20:39
  • Does this answer your question? [Remove HTML comments with Regex, in Javascript](https://stackoverflow.com/questions/5653207/remove-html-comments-with-regex-in-javascript) – justFatLard Oct 31 '20 at 01:10

18 Answers18

94

NOTE: Regex is not a lexer or a parser. If you have some weird edge case where you need some oddly nested comments parsed out of a string, use a parser. For the other 98% of the time this regex should work.

I had pretty complex block comments going on with nested asterisks, slashes, etc. The regular expression at the following site worked like a charm:

http://upshots.org/javascript/javascript-regexp-to-remove-comments
(see below for original)

Some modifications have been made, but the integrity of the original regex has been preserved. In order to allow certain double-slash (//) sequences (such as URLs), you must use back reference $1 in your replacement value instead of an empty string. Here it is:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm

// JavaScript: 
// source_string.replace(/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm, '$1');

// PHP:
// preg_replace("/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/m", "$1", $source_string);

DEMO: https://regex101.com/r/B8WkuX/1

FAILING USE CASES: There are a few edge cases where this regex fails. An ongoing list of those cases is documented in this public gist. Please update the gist if you can find other cases.

...and if you also want to remove <!-- html comments --> use this:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*|<!--[\s\S]*?-->$/

(original - for historical reference only)

// DO NOT USE THIS - SEE ABOVE
/(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm
Ryan Wheale
  • 26,022
  • 8
  • 76
  • 96
  • 7
    `(?:\/\*(?:[\s\S]*?)\*\/)|(?:^\s*\/\/(?:.*)$)` should be better as it wouldn't treat `//` in the middle of string, for example in urls – Eugene Nagorny Jun 18 '13 at 12:36
  • @Ideviantik - Thanks! I have updated my answer. Hopefully this continues to evolve, as your solution would skip over something like this: `var foo = "bar";// This is a comment` - so I added an optional semicolon in there. – Ryan Wheale Jun 18 '13 at 22:33
  • @RyanWheale I end up with `(^|\s+)//.*$|/\*(.|.\s)*?\*/` solution(I am using it in python). – Eugene Nagorny Jun 19 '13 at 15:45
  • 1
    Seems to fail on this: `var foo = "everything /* in this string */ should be kept"` – DG. Oct 25 '13 at 13:01
  • 2
    @DG - Feel free to grab a javascript parser and use it for your extremely edge-case scenario. The regex above is not for parsing, but rather for removing typical comments within a file. If a parser is over-kill, I suggest you either encode your slashes (/) or astrisk (*) or use concatenation: `"everything /" + "* in this string *" + "/ should be kept"` – Ryan Wheale Oct 25 '13 at 19:46
  • 2
    @RyanWheale - Calm down. I'm just cautioning others to be aware. It also fails on `foo = "this //is.no.comment"`. But the biggest flaw is that it will strip ";" from `ab=a+b; // AB`. The original doesn't, but it has other flaws as acknowledged by the original author. BTW, your suggested workaround is only useful if I am responsible for the code that will be stripped. If that were the case, I could impose all sorts of restrictions on myself and writing the regex would be trivial. All that said, I've not found a perfect regex solution. It probably is (practically) impossible. – DG. Oct 26 '13 at 01:02
  • @DG - I didn't mean to convey frustration. This regex has helped me clean hundreds of files, and it has helped others too. Your last example is extremely rare and should be left for parsing ([regex is not for parsing](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)). Not trying to be sarcastic. You pointed out that the example was stripping semicolons, and I have updated the example accordingly. Thanks for pointing that out and helping evolve this solution. – Ryan Wheale Oct 26 '13 at 05:13
  • @RyanWheale thanks - this worked for me apart from it didn't capture )|(?:([\s;])+\/\/(?:.*)$)` – twiz911 Mar 12 '14 at 03:52
  • I had to remove the `;` in `[\s;]`, as it removes any `;` in front of '//', which if newlines are removed sometimes generates syntax errors. example input: `$('iframe').remove();NEWLINE// semicoloneatingcommentNEWLINE$();` // OK output: `$('iframe').remove();$();` // ERROR output: `$('iframe').remove()$();` Context: I use this comment stripper in combination with a newline stripper to quickly convert some javascript to a bookmarklet. `url = 'javascript:' + (''+s).replace(/(?:\/\*(?:[\s\S]*?)\*\/)|(?:([\s;])+\/\/(?:.*)$)/gm, '$1').replace(/\s*(\r)?\n\s*/g,'');` – MoonLite Feb 09 '16 at 15:45
  • Can you create a regexpal or fiddle showing the error. Any semicolons removed should be saved in the back-reference. – Ryan Wheale Feb 09 '16 at 17:56
  • it dosnt work when u put the double slash comment directly behind a variable somthing like this : `tmpvalue = null; //clear memory` , any solution ? – The Doctor Apr 07 '18 at 19:22
  • @SUB-HDR Yes it does work. I just copied the regex and your code into regexpal and it matches the comment. You must be doing something else wrong. If you can't find the problem, please open a new question and refrain from using this thread for support. Thanks. – Ryan Wheale Apr 11 '18 at 17:08
  • No, it does not work. This regex also removes } immediately preceding //comments. This is an extremely dangerous bug. – John Smith Aug 16 '19 at 04:26
  • Try adding the a line containing only "}//test" (without the quotes) to the Regextester link above to see it for yourself. – John Smith Aug 16 '19 at 04:36
  • Actually, looking at it using @RyanWheale's own demo link without modification, it also removes needed commas before double slashes. It appears to often indiscriminately remove whatever character appears before the //. This cannot be considered a safe working solution at all. – John Smith Aug 16 '19 at 04:39
  • @JohnSmith All of the use cases you cited work as intended. You simply didn't read. The regex itself will match the character immediately in front of double slashes. There are very specific instructions on how you are supposed to preserve that character. #rtfi – Ryan Wheale Aug 16 '19 at 23:33
  • upvote for useful in most case. does not work for this one: /**//// --- it results in '/', but I expect ''. – zipper Sep 01 '19 at 06:42
  • @RyanWheale If that is the case then the instructions are not sufficiently clear. Contrary to your accusation, I did read, and now I have just read a second time, and I still don't see what specific instructions you are talking about for preserving the preceding character. What are they? Your help is appreciated, but there's no need to get testy. #idrtfia – John Smith Sep 24 '19 at 13:03
  • @JohnSmith - Not getting testy - you must use backreference `$1` in your replacement. Failure to do so will result in the loss of any character which comes immediately before a double slash `//`. Not sure I can get much clearer. Hope that helps. – Ryan Wheale Oct 02 '19 at 01:20
  • @RyanWheale calms down for no one. no. one. This is a fantastic answer. And more importantly an outline of a thought process; so often, I see commenters and OPs simply saying, "Rrah, it doesn't work :(" Thinking about why we ought or ought not do things, or about why "the code don werks," is more important to our development as problem solvers than simply what's the fix for my specific one-off issue. Great Answer and good discussion. I've learned from this-- more than can be said of some of my own answers/posts, even. *Thx for highlighting that a parser may be the best fit straightaway. :-{P* – Todd Oct 27 '20 at 23:09
20

try this,

(\/\*[\w\'\s\r\n\*]*\*\/)|(\/\/[\w\s\']*)|(\<![\-\-\s\w\>\/]*\>)

should work :) enter image description here

AabinGunz
  • 12,109
  • 54
  • 146
  • 218
  • 6
    What about `"foo /* bar */ baz"`? – Gumbo May 13 '11 at 08:54
  • Your regex will match all html tags and not only the comments. – stema May 13 '11 at 08:58
  • @stema,@Gumbo: Thanks for pointing out, I am new to learning regex. Hope this regex does the job. – AabinGunz May 13 '11 at 09:07
  • You should learn a bit about the square brackets, they are defining character classes, I am quite sure you are expecting another behaviour. [perlretut.html#Using-character-classes](http://perldoc.perl.org/perlretut.html#Using-character-classes) – stema May 13 '11 at 09:15
  • RegEx didn't quite help though. I was facing some issues like comment brackets in a string, \/\/* something like that. So I wrote a whole parser for removing strings (inline, multiline (works for CSS and JavaScript) and those HTML conditional comments/tags). Thanks for your help, all of you. And if anyone needs a function in javascript for removing these comments, please contact me as I would be pleased to help anyone with this parser. Bye. – metaforce May 13 '11 at 11:28
  • The above regex won't work if a multiline comment has non-word characters within it. To catch non-word characters I added a \W to the first regex. – ethicalhack3r Feb 14 '12 at 15:26
  • 9
    No solution with regex for this. You cannot distinguish if //this appears inside of code (string) or at the end of line (no way to count number (get even number) of quote characters ("|') so only after that find //comment) – Nevena Jul 11 '12 at 21:19
  • 6
    This will also match the // in http://, so will be consider as comment! which is NOT! – Mojtaba Apr 01 '13 at 18:23
  • 12
    Don't use this regex! it also matches `http://` and any other regex that has `//` or `/*`. So it's unusable – jonschlinkert Aug 09 '13 at 01:45
  • Unfortunately, this also breaks on `/* @param ... */` – John Weisz Dec 18 '15 at 18:47
  • With lines like `var x = 1; // Comment` LINE BREAK `var y = 2; // Comment` - it also fails – Adrian Lynch Feb 21 '16 at 15:24
  • This will also affect things like `var url = http://www.google.com` and break the entire script. – NoobishPro Aug 24 '17 at 15:57
  • Doesn't quite work, sorry had to downvote, I know it's very challenging to create a regex for this. Might be easier to create several regex and do more than one pass-through. –  Nov 01 '18 at 19:38
  • Does not work for `/* (line break) something like this (line break) */` – Xanlantos Mar 29 '21 at 12:48
7

I have been putting togethor an expression that needs to do something similar.
the finished product is:

/(?:((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)|(\/\*(?:(?!\*\/).|[\n\r])*\*\/)|(\/\/[^\n\r]*(?:[\n\r]+|$))|((?:=|:)\s*(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))|((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()|(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))|(<!--(?:(?!-->).)*-->))/g

Scary right?

To break it down, the first part matches anything within single or double quotation marks
This is necessary to avoid matching quoted strings

((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)

the second part matches multiline comments delimited by /* */

(\/\*(?:(?!\*\/).|[\n\r])*\*\/)

The third part matches single line comments starting anywhere in the line

(\/\/[^\n\r]*(?:[\n\r]+|$))

The fourth through sixth parts matchs anything within a regex literal
This relies on a preceding equals sign or the literal being before or after a regex call

((?:=|:)\s*(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))
((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()
(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))

and the seventh which I originally forgot removes the html comments

(<!--(?:(?!-->).)*-->)

I had an issue with my dev environment issuing errors for a regex that broke a line, so I used the following solution

var ADW_GLOBALS = new Object
ADW_GLOBALS = {
  quotations : /((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)/,
  multiline_comment : /(\/\*(?:(?!\*\/).|[\n\r])*\*\/)/,
  single_line_comment : /(\/\/[^\n\r]*[\n\r]+)/,
  regex_literal : /(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)/,
  html_comments : /(<!--(?:(?!-->).)*-->)/,
  regex_of_doom : ''
}
ADW_GLOBALS.regex_of_doom = new RegExp(
  '(?:' + ADW_GLOBALS.quotations.source + '|' + 
  ADW_GLOBALS.multiline_comment.source + '|' + 
  ADW_GLOBALS.single_line_comment.source + '|' + 
  '((?:=|:)\\s*' + ADW_GLOBALS.regex_literal.source + ')|(' + 
  ADW_GLOBALS.regex_literal.source + '[gimy]?\\.(?:exec|test|match|search|replace|split)\\(' + ')|(' + 
  '\\.(?:exec|test|match|search|replace|split)\\(' + ADW_GLOBALS.regex_literal.source + ')|' +
  ADW_GLOBALS.html_comments.source + ')' , 'g'
);

changed_text = code_to_test.replace(ADW_GLOBALS.regex_of_doom, function(match, $1, $2, $3, $4, $5, $6, $7, $8, offset, original){
  if (typeof $1 != 'undefined') return $1;
  if (typeof $5 != 'undefined') return $5;
  if (typeof $6 != 'undefined') return $6;
  if (typeof $7 != 'undefined') return $7;
  return '';
}

This returns anything captured by the quoted string text and anything found in a regex literal intact but returns an empty string for all the comment captures.

I know this is excessive and rather difficult to maintain but it does appear to work for me so far.

wolffer-east
  • 1,069
  • 7
  • 14
  • I get `SyntaxError: unterminated parenthetical` in Firefox. – DG. Oct 25 '13 at 13:03
  • I made some changes and threw up a js fiddle to make it easier to copy out. [link](http://jsfiddle.net/U4qeT/2/) Hopefully this helps. Please note - this will work on scripts and most other code, but if you get any free text with parens you will run into trouble. The code doesn't know how to deal with the ' in doesn't when it isn't itself in quotations – wolffer-east Nov 01 '13 at 20:58
  • "doesn't know how to deal with [a single quote if it doesn't appear] in quotations" - That is a VERY important fact to note. Frankly, it makes the expression unusable for most general purpose needs. It is very common to use single quotes instead of double quotes. But my testing shows much more serious problems with the expression. The test case in your fiddle is very limited. I have a far more extensive test case and the expression butchers it badly in many places. IMHO, it is pointless to try and fix. My research indicates strongly that no single regex can do the job adequately. – DG. Nov 02 '13 at 01:37
  • I came up with this to specifically deal with javascript code. Unfortunately it doesnt work with general text, but that is because it is a completely different use case. Anyways, could you put your more extensive test case in a fiddle and drop a link? it would be extremely helpful for me to know what issues this will break on. Even if no one else uses it, I need to know where it breaks for my own usage. – wolffer-east Nov 04 '13 at 14:49
  • It doesn't work. Transforms: function(field) { // comment example return new field('like').equal('no'); } into "function (field) {return new field().equal();}" Anything between quote is removed. – Julien L Jan 02 '14 at 22:23
  • @JulienL I put up a js fiddle with the code here: [link](http://jsfiddle.net/L2HcU/) and it appears to work. I copied and pasted the final code segment from my post into the section then made a couple changes. First I wrapped the change portion in a function so it can be called froma link. second I targeted an element in the example to help demonstrate. And third I returned \n in the final option to help fix a linebreak issue. None of these changes should have affected your code. In this case it looks to work, though if you run into any others than dont please let me know so I can debug. – wolffer-east Jan 07 '14 at 21:06
  • Why is `'\\.(?:exec|test|match|search|replace|split)\\('` not cached in `ADW_GLOBALS`? – yckart Dec 05 '14 at 03:04
  • @yckart Good point, looks like it is left out because I overlooked it. I remember leaving it out of the regex_literal to allow re use and then never added it as its own item. – wolffer-east Dec 08 '14 at 16:38
  • @wolffer-east Any ideas to name it? How would you? – yckart Dec 08 '14 at 18:02
  • @yckart probably something like `regex_methods` – wolffer-east Dec 08 '14 at 20:02
  • This is the greatest answer to any question I have every seen. Thanks for breaking it down into parts! – Jon Doe Jul 11 '17 at 20:00
5

This works for almost all cases:

var RE_BLOCKS = new RegExp([
  /\/(\*)[^*]*\*+(?:[^*\/][^*]*\*+)*\//.source,           // $1: multi-line comment
  /\/(\/)[^\n]*$/.source,                                 // $2 single-line comment
  /"(?:[^"\\]*|\\[\S\s])*"|'(?:[^'\\]*|\\[\S\s])*'/.source, // - string, don't care about embedded eols
  /(?:[$\w\)\]]|\+\+|--)\s*\/(?![*\/])/.source,           // - division operator
  /\/(?=[^*\/])[^[/\\]*(?:(?:\[(?:\\.|[^\]\\]*)*\]|\\.)[^[/\\]*)*?\/[gim]*/.source
  ].join('|'),                                            // - regex
  'gm'  // note: global+multiline with replace() need test
);

// remove comments, keep other blocks
function stripComments(str) {
  return str.replace(RE_BLOCKS, function (match, mlc, slc) {
    return mlc ? ' ' :         // multiline comment (replace with space)
           slc ? '' :          // single/multiline comment
           match;              // divisor, regex, or string, return as-is
  });
}

The code is based on regexes from jspreproc, I wrote this tool for the riot compiler.

See http://github.com/aMarCruz/jspreproc

aMarCruz
  • 2,434
  • 1
  • 16
  • 14
4

In plain simple JS regex, this:

my_string_or_obj.replace(/\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm, ' ')
Jim O'Brien
  • 2,512
  • 18
  • 29
Shobhit Sharma
  • 604
  • 1
  • 9
  • 18
  • this worked! although perhaps replace it with '` '` (a single space) instead of '' –  Nov 01 '18 at 19:40
  • Thanks! I've looked at like 10 different RegExes and this one was the only one that worked perfectly in each scenario! – Sv443 Feb 01 '19 at 15:35
  • Using the given regex, the below answer gives `3//`. ```p = /\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm;` x='3//'; x.match(p);``` – Himadhar H Jun 17 '21 at 09:33
2

a bit simpler -

this works also for multiline - (<!--.*?-->)|(<!--[\w\W\n\s]+?-->)

enter image description here

Aurielle Perlmann
  • 5,323
  • 1
  • 15
  • 26
2

Simple regex ONLY for multi-lines:

/\*((.|\n)(?!/))+\*/
vantrung -cuncon
  • 10,207
  • 5
  • 47
  • 62
2

The accepted solution does not capture all common use cases. See examples here: https://regex101.com/r/38dIQk/1.

The following regular expression should match JavaScript comments more reliably:

/(?:\/\*(?:[^\*]|\**[^\*\/])*\*+\/)|(?:\/\/[\S ]*)/g

For demonstration, visit the following link: https://regex101.com/r/z99Nq5/1/.

DRD
  • 5,557
  • 14
  • 14
  • In truth that is enough: `/\/\*(?:[^*]|\**[^*/])*\*+\//g`. Thanks so much. – rplaurindo Nov 02 '21 at 11:35
  • Note that your solution will also remove valid urls, which is certainly not desirable. – nickpapoutsis Aug 29 '22 at 17:46
  • Then adjust the regex to suit your purposes. For example you can put a negative look behind (if it applies to your execution environment) to ignore forward slashes that are preceded by double or single quotes or `http`. – DRD Aug 29 '22 at 19:30
  • @rplaurindo, the regex your are suggesting will not match single-line `//` comments. – DRD Aug 29 '22 at 19:31
  • In fact, @DRD, I'm using `/\/\*.+?\*+\//gs` currently to match block comments and `/\/\/.*\n*/g` to match comments of line instead of only one pattern for the two kinds of comments. – rplaurindo Aug 30 '22 at 12:14
  • Makes sense. Turning `+` into lazy via `?` takes care of preventing selection of all comments as one block. And use of `s` flag to make dot `.` match new lines contributes to a shorter regex. However, using `/\/\/.*\n*/g` instead of `/\/\/[\S ]*/g` will match the newlines also. Unless, that's the objective, I would stick to just matching the single-line comment sans the newline character. – DRD Aug 30 '22 at 21:34
  • This fails for strings with urls like let v = "https://..." and will treat them as comments. – Johncl Feb 17 '23 at 11:17
  • @Johncl, this was meant to be a base rather than cover-all scenarios answer. If your code has a pattern as you described, then adjust the regex by adding, say, negative look behinds: `(?:\/\*(?:[^\*]|\**[^\*\/])*\*+\/)|(?:(?<!https?:)\/\/[\S ]*)` https://regex101.com/r/ZarEY5/1 – DRD Mar 11 '23 at 21:10
2

for /**/ and //

/(?:(?:\/\*(?:[^*]|(?:\*+[^*\/]))*\*+\/)|(?:(?<!\:|\\\|\')\/\/.*))/gm

enter image description here

MaZzIMo24
  • 139
  • 1
  • 6
1

This is late to be of much use to the original question, but maybe it will help someone.

Based on @Ryan Wheale's answer, I've found this to work as a comprehensive capture to ensure that matches exclude anything found inside a string literal.

/(?:\r\n|\n|^)(?:[^'"])*?(?:'(?:[^\r\n\\']|\\'|[\\]{2})*'|"(?:[^\r\n\\"]|\\"|[\\]{2})*")*?(?:[^'"])*?(\/\*(?:[\s\S]*?)\*\/|\/\/.*)/g

The last group (all others are discarded) is based on Ryan's answer. Example here.

This assumes code is well structured and valid javascript.

Note: this has not been tested on poorly structured code which may or may not be recoverable depending on the javascript engine's own heuristics.

Note: this should hold for valid javascript < ES6, however, ES6 allows multi-line string literals, in which case this regex will almost certainly break, though that case has not been tested.


However, it is still possible to match something that looks like a comment inside a regex literal (see comments/results in the Example above).

I use the above capture after replacing all regex literals using the following comprehensive capture extracted from es5-lexer here and here, as referenced in Mike Samuel's answer to this question:

/(?:(?:break|case|continue|delete|do|else|finally|in|instanceof|return|throw|try|typeof|void|[+]|-|[.]|[/]|,|[*])|[!%&(:;<=>?[^{|}~])?(\/(?![*/])(?:[^\\\[/\r\n\u2028\u2029]|\[(?:[^\]\\\r\n\u2028\u2029]|\\(?:[^\r\n\u2028\u2029ux]|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}))+\]|\\(?:[^\r\n\u2028\u2029ux]|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}))*\/[gim]*)/g

For completeness, see also this trivial caveat.

Community
  • 1
  • 1
Nolo
  • 846
  • 9
  • 19
1

If you click on the link below you find a comment removal script written in regex.

These are 112 lines off code that work together also works with mootools and Joomla and drupal and other cms websites. Tested it on 800.000 lines of code and comments. works fine. This one also selects multiple parenthetical like ( abc(/nn/('/xvx/'))"// testing line") and comments that are between colons and protect them. 23-01-2016..! This is the code with the comments in it.!!!!

Click Here

Community
  • 1
  • 1
John Smith
  • 37
  • 3
  • Do NOT post copy-paste answers multiple times: [1](http://stackoverflow.com/a/34828197/1743880) [2](http://stackoverflow.com/a/34828160/1743880) [3](http://stackoverflow.com/a/34826806/1743880). You should post one good answer and flag the other as duplicate instead. – Tunaki Jan 16 '16 at 14:46
  • Deleted almost duplicates on same pages 3X (Source-file). How do you flag as dulpicate, got this answer on 3 pages so people can find it with ease. I think I should Flag the other two as duplicates do you mean I should copy link to the one thats on here already? still learning what is propper for a forum like this one. – John Smith Jan 23 '16 at 00:41
1

I was looking for a quick Regex solution too, but none of the answers provided work 100%. Each one ends up breaking the source code in some way, mostly due to comments detected inside string literals. E.g.

var string = "https://www.google.com/";

Becomes

var string = "https:

For the benefit of those coming in from google, I ended up writing a short function (in Javascript) that achieves what the Regex couldn't do. Modify for whatever language you are using to parse Javascript.

function removeCodeComments(code) {
    var inQuoteChar = null;
    var inBlockComment = false;
    var inLineComment = false;
    var inRegexLiteral = false;
    var newCode = '';
    for (var i=0; i<code.length; i++) {
        if (!inQuoteChar && !inBlockComment && !inLineComment && !inRegexLiteral) {
            if (code[i] === '"' || code[i] === "'" || code[i] === '`') {
                inQuoteChar = code[i];
            }
            else if (code[i] === '/' && code[i+1] === '*') {
                inBlockComment = true;
            }
            else if (code[i] === '/' && code[i+1] === '/') {
                inLineComment = true;
            }
            else if (code[i] === '/' && code[i+1] !== '/') {
                inRegexLiteral = true;
            }
        }
        else {
            if (inQuoteChar && ((code[i] === inQuoteChar && code[i-1] != '\\') || (code[i] === '\n' && inQuoteChar !== '`'))) {
                inQuoteChar = null;
            }
            if (inRegexLiteral && ((code[i] === '/' && code[i-1] !== '\\') || code[i] === '\n')) {
                inRegexLiteral = false;
            }
            if (inBlockComment && code[i-1] === '/' && code[i-2] === '*') {
                inBlockComment = false;
            }
            if (inLineComment && code[i] === '\n') {
                inLineComment = false;
            }
        }
        if (!inBlockComment && !inLineComment) {
            newCode += code[i];
        }
    }
    return newCode;
}
user2867288
  • 1,979
  • 16
  • 20
1

2019:

All other answers are incomplete and full of shortcomings. I take the time to write complete answer that WORK

function stripComments(code){
        const savedText = [];
        return code
           .replace(/(['"`]).*?\1/gm,function (match) {
            var i = savedText.push(match);
            return (i-1)+'###';
        })
        // remove  // comments
        .replace(/\/\/.*/gm,'')
        // now extract all regex and save them
        .replace(/\/[^*\n].*\//gm,function (match) {
            var i = savedText.push(match);
            return (i-1)+'###';
        })
        // remove /* */ comments
        .replace(/\/\*[\s\S]*\*\//gm,'')
        // remove <!-- --> comments
        .replace(/<!--[\s\S]*-->/gm, '')
        .replace(/\d+###/gm,function(match){
            var i = Number.parseInt(match);
            return  savedText[i];
        })
       
    }
    var cleancode = stripComments(stripComments.toString())
    console.log(cleancode)

Other answers not working on samples code like that:

// won't execute the creative code ("Can't execute code form a freed script"),
navigator.userAgent.match(/\b(MSIE |Trident.*?rv:|Edge\/)(\d+)/);

function stripComments(code){
    const savedText = [];
    return code
          // extract strings and regex 
        .replace(/(['"`]).*?\1/gm,function (match) {
            savedText.push(match);
            return '###';
        })
        // remove  // comments
        .replace(/\/\/.*/gm,'')
        // now extract all regex and save them
        .replace(/\/[^*\n].*\//gm,function (match) {
            savedText.push(match);
            return '###';
        })
        // remove /* */ comments
        .replace(/\/\*[\s\S]*\*\//gm,'')
        // remove <!-- --> comments
        .replace(/<!--[\s\S]*-->/gm, '')
        /*replace \ with \\ so we not lost \b && \t*/
        .replace(/###/gm,function(){
            return savedText.shift();
        })
   
}
var cleancode = stripComments(stripComments.toString())
console.log(cleancode)
pery mimon
  • 7,713
  • 6
  • 52
  • 57
0

I wonder if this was a trick question given by a professor to students. Why? Because it seems to me it is IMPOSSIBLE to do this, with Regular Expressions, in the general case.

Your (or whoever's code it is) can contain valid JavaScript like this:

let a = "hello /* ";
let b = 123;
let c = "world */ ";

Now if you have a regexp which removes everything between a pair of /* and */, it would break the code above, it would remove the executable code in the middle as well.

If you try to devise a regexp that would not remove comments which contain quotes then you cannot remove such comments. That applies to single-quote, double-quotes and back-quotes.

You can not remove (all) comments with Regular Expressions in JavaScript, it seems to me, maybe someone can point out a way how to do it for the case above.

What you can do is build a small parser which goes through the code character by character and knows when it is inside a string and when it is inside a comment, and when it is inside a comment inside a string and so on.

I'm sure there are good open source JavaScript parsers that can do this. Maybe some of the packaging and minifying tools can do this for you as well.

Panu Logic
  • 2,193
  • 1
  • 17
  • 21
0

For block comment: https://regex101.com/r/aepSSj/1

Matches slash character (the \1) only if slash character is followed by asterisk.

(\/)(?=\*)

maybe followed by another asterisk

(?:\*)

followed by first group of match, or zero or more times from something...maybe, without remember the match but capture as a group.

((?:\1|[\s\S])*?)

followed by asterisk and first group

(?:\*)\1

For block and/or inline comment: https://regex101.com/r/aepSSj/2

where | mean or and (?=\/\/(.*)) capture anything after any //

or https://regex101.com/r/aepSSj/3 to capture the third part too

all in: https://regex101.com/r/aepSSj/8

Adrian Miranda
  • 315
  • 3
  • 9
0

DEMO: https://onecompiler.com/javascript/3y825u3d5

const context = `
<html>
<script type="module">
/* I'm a comment */
/*
 * I'm a comment aswell url="https://example.com/"; 
*/
var re = /\\/*not a comment!*/; 
var m = /\\//.test("\"not a comment!\"");
var re = /"/; // " thiscommentishandledasascode!
const s1 = "multi String \\
    \\"double quote\\" \\
 // single commet in str \\
 /* multiple lines commet in str \\
    secend line */    \\
last line";

const s2 = 's2"s';
const url = "https://example.com/questions/5989315/";
let a = "hello /* ";
let b = 123;
let c = "world */ ";
//public static final String LETTERS_WORK_FOLDER = "/Letters/Generated/Work";

console.log(/*comment in 
    console.log*/ "!message at console.log");

function displayMsg(        // the end comment
    /*commet arg1*/ a, ...args) {
  console.log("Hello World!", a, ...args)
}
<\/script>
<body>
<!-- HTML Comment //--> or <!-- HTML Comment -->
<!--
function displayMsg() {
  alert("Hello World!")
}
//-->
</body>
</html>
`;
console.log("before:\n" + context);
console.log("<".repeat(100));
const save = {'txt':[], 'comment':[], 'regex': []};
const context2 = 
    context.replace(/(['"`]|\/[\*\/]{0,1}|<!\-\-)(?:(?=(?<=\/\*))[\s\S]*?\*\/|(?=(?<=\/\/)).*|(?=(?<=<!\-\-))[\s\S]*?\-\->|(?=(?<=[\s\=]\/)).+?(?<!\\)\/|(?=(?<=['"`]))[\s\S]*?(?<!\\)\1)/g,     
function (m) {
    const t = (m[0].match(/["'`]/) && 'txt') || (m.match(/^(\/\/|\/\*|<)/) && 'comment') || 'regex';
    save[t].push(m);
    return '${save.'+t+'['+(save[t].length - 1)+']}';
}).replace(/[\S\s]*/, function(m) {
    console.log("watch:\n"+m);
    console.log(">".repeat(100));
    /*
        @@remove comment
            save.comment = save.comment.map(_ => _.replace(/[\S\s]+/,""));
        @@replace comment
            save.comment = save.comment.map(_ => _.replace(/console\.log/g, 'CONSOLE.LOG'));
        @@replace text
            save.txt = save.txt.map(_ => _.replace(/console\.log/g, 'CONSOLE.LOG'));
        @@replace your code
        m = m.replace(/console\.log/g, 'console.warn');
    */
    // console.warn("@@remove comment -> save.comment.fill('');");
    save.comment.fill('');
    return m;
}).replace(/\$\{save.(\w+)\[(\d+)\]\}/g, function(m, t, id) {
    return save[t][id];
}).replace(/[\S\s]*/, function(m) {
    console.log("result:", m);
    // console.log("compare:", (context === m));
    return m;
})

My English is not good, can someone help translate what I have written, I will be very grateful

Consider some problems

A.There may be strings in comments, or comments in strings, like

  1. /*
    const url="https://example.com/";

    */

  2. const str = "i am s string and /*commet in string*/";

B. " or ' or ` in a string will be escaped with
like

  1. const str = "my name is \"john\"";
  2. const str2 = 'i am "john\'s" friend';

Combining the above multiple regex replaces will cause some problems Consider regex find to the beginning part

 " ' ` // /* <!--

use regex

(['"`]|\/[\*\/]|<!\-\-)

(['"`]|/[*/]|<!\-\-) result as \1

\1 is one of ' or " or

`

or /* or // or <!--

use If-Then-Else Conditionals in Regular Expressions

https://www.regular-expressions.info/conditional.html

(?:(?=(?<=\/\*))[\s\S]*?\*\/|(?=(?<=\/\/)).*|(?=(?<=<!\-\-))[\s\S]*?\-\->|[^\1]*?(?<!\\)\1)

if (?=(?<=\/\*))[\s\S]*?\*\/

(?=(?<=\/\*)) positive lookbehind (?<=\/\*) beacuse/* It's a multi-line comment, so it should be followed by the latest one */

[\s\S]*?\*\/ match complete /*..\n..\n. */

elseif (?=(?<=\/\/)).*

(?=(?<=//)).* positive lookbehind (?<=\/\/) catch // single line commet

.* match complete // any single commet

elseif (?=(?<=<!\-\-))[\s\S]*?\-\->

(?=(?<=<!--)) positive lookbehind (?<=<!\-\-)

[\s\S]*?\-\-> match complete <!--..\n..\n. /*/*\-\->

else [^\1]*?(?<!\\)\1

Finally need to process the string

use regex [\s\S]*?\1

maybe the wrong result with "STR\" or 'STR"S\'

at [\s\S]*?we can use "positive lookbehind"

add this [\s\S]*?(?<!\\)\1 to filter escape quotes

end

  • 1
    Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. – jasie Jun 09 '22 at 13:34
  • Thank you for reminder, i try to my best – pascual.lin Jun 10 '22 at 07:57
  • you are welcome, but non-english is forbidden on SO! – jasie Jun 10 '22 at 09:05
  • I have removed non-english...., and fix regex in script bug – pascual.lin Jun 11 '22 at 06:52
0

Regex for identifying inline comments in JSON.

/[^"\S]+\/\/.+$/gm

Test cases used:

{
    // regex is cool
    "property": "http://regeixisfun.com", // regex is still cool
// http://regexisfun.com
    "property2": "One//regex//is//fun"
// regex is very "cool"
    "property3": "regex",
    //// regex is really "cool"
}

Test it here: https://regex101.com/r/YeyVxv/2

Asher G.
  • 4,903
  • 5
  • 27
  • 30
-1

Based on above attempts and using UltraEdit , mostly Abhishek Simon, I found this to work for inline comments and handles all of the characters within the comment.

(\s\/\/|$\/\/)[\w\s\W\S.]*

This matches comments at the start of the line or with a space before //

//public static final String LETTERS_WORK_FOLDER = "/Letters/Generated/Work";

but not

"http://schemas.us.com.au/hub/'>" +

so it is only not good for something like

if(x){f(x)}//where f is some function

it just needs to be

if(x){f(x)} //where f is function

Steve Black
  • 609
  • 5
  • 9
  • 1
    Note that it doesn't work on inline comments without anything to the left of the symbols "//". Example of this failure: https://regex101.com/r/UuFDLC/2 – Alberto Schiabel Jul 06 '17 at 08:53