26

How to find this pattern using regex?

C style block comments

/* xxxxxxxxxxxx */

linquize
  • 19,828
  • 10
  • 59
  • 83
  • What if `/*` is on the same line and after a `//`, meaning it does not start a comment? – Patashu Apr 23 '13 at 03:04
  • 1
    No, I do not need to handle such situation – linquize Apr 23 '13 at 03:07
  • Besides using regular expressions, if you wanted to go all-out you could use a language parser with a C (or whatever) language grammar. Examples are yacc, javacc, antlr – Jasper Blues Apr 23 '13 at 03:41
  • 1
    See this [comment](http://stackoverflow.com/questions/16086617/removing-comments-with-a-sliding-window-without-nested-while-loops#comment22966260_16086617) to a question about recognizing C comments for some ideas about the complexity of dealing with C comments 'properly' (meaning 'in the way that a C compiler must deal with C comments'). It is not straight-forward. You're probably dealing only with simple comments — but spare a thought for the compiler writer. I suspect there are other relevant questions and answers too. (No; this question is not a duplicate of the one referenced.) – Jonathan Leffler Apr 23 '13 at 07:12
  • I tried write using java as "\\/\\*(\\*(?!\\/)|[^*])*\\*\\" but with match function of String class, it throw an exception of java.util.regex.PatternSyntaxException: Unexpected internal error near index 23. Do some one know why ? – Yahia Farghaly Jan 21 '18 at 18:54
  • This worked for me: \/\*.*?\*\/ – orellabac Sep 26 '20 at 22:47

1 Answers1

41

Try using

\/\*(\*(?!\/)|[^*])*\*\/

to capture single line and multi-line block comments. It searches for /* followed by any number of either:

  • an * that is not followed by a /
  • any char except *

and then the closing */ again.

Campfire
  • 854
  • 1
  • 6
  • 12
  • You don't need to match whitespace characters in a separate branch; `[^*/]` has them covered. All the `|\s` does is open you up to [catastrophic backtracking](http://www.regular-expressions.info/catastrophic.html). Also, you need to get that slash out of there, or your regex will fail to match comments with slashes inside them. – Alan Moore Apr 23 '13 at 11:13
  • Changed to your suggestions (although OP said, that comments with slashes seem not to be an issue) – Campfire Apr 23 '13 at 14:14
  • why not `/\*(.(?!\*/))*\*/` ? first an `/*` then any character not followed by `*/` then `*/` – zzh1996 Jan 23 '17 at 08:49
  • 10
    Wouldn't it be simpler to use ```/\*.*?\*/```. – ensonic May 21 '17 at 19:17
  • 1
    I like to add raw-string `r"/[*]([^*]|([*][^/]))*[*]/"` as it worked in python ! – KRoy Jan 13 '18 at 21:13
  • @ensonic It would definitely be simpler, however, `?`s are not always as performant: https://stackoverflow.com/a/36328890/3492994 – Stefnotch Apr 02 '18 at 19:30
  • 1
    @shuva, I quite like your regex, but right now it misses `/* ... **/` since the second to last star would match `[*][^/]`, but this also eats the final star. As a quick fix I used `r"/[*]([^*]|([*][^/]))*[*]+/"` (note the extra plus at the end, allowing additional trailing stars.) – Charles Ofria Apr 14 '19 at 16:37
  • friendly reminder that this one is wrong, it does not handle the * correctly. – Andrew Shi Oct 08 '20 at 09:03
  • @ensonic: Is it just me or does your solution not cover multiline comments? – Campfire Nov 06 '20 at 13:33
  • @AndrewShi It is? Just tried it again at regexr.com and it worked just fine for single and multi line block comments. – Campfire Nov 06 '20 at 13:34
  • @Campfire I think this depends on the specific regex parser, some include newline in *, some don't – afarley Mar 07 '23 at 19:52