6

I need a C# regex to delete everything between /* and */ including the /**/. So, basically remove all code comments in the given text.

H H
  • 263,252
  • 30
  • 330
  • 514
Andrej
  • 736
  • 2
  • 14
  • 35
  • 1
    you really don't need a regex for that. – Brian Driscoll May 26 '11 at 12:16
  • So what is the question? – Renatas M. May 26 '11 at 12:19
  • 1
    That is not that easy. Your code may contain strings like "This: /* boo */ is no comment". – Jens May 26 '11 at 12:31
  • 1
    Or commented comments: `// no comment here /*`, followed by `WillBeRemoved(); /* real comment */`. Ok, not too common, but you can get very creative with messing this up. – Kobi May 26 '11 at 12:46
  • 3
    C# is not a *regular language*, so it is impossible to recognize it correctly with a *regular expression*. If you want to remove comments correctly then what you have to build is a *lexer*. Break the text up into tokens and identify which tokens are comments. – Eric Lippert May 26 '11 at 15:18
  • Why on earth would you want to remove comments in a piece of code. Please do not make programmers in general look stupid by actually doing this. – Security Hound May 26 '11 at 18:04
  • 2
    @Eric - although they are certainly not the right tool for this job, .NET regular expressions are not limited to recognizing regular languages (e.g. see http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition). – kvb May 26 '11 at 19:34
  • 1
    @Ramhound: There are lots of reasons to remove comments. For example, when compressing code that is going to be delivered over a highly performance-sensitive channel where it's not going to be read by humans on the other end. – Eric Lippert May 26 '11 at 20:28

3 Answers3

6

Should be something like this:

var regex = new Regex("/\*((?!\*/).)*\*/", RegexOptions.Singleline);

regex.Replace(input, "");
petho
  • 677
  • 4
  • 10
2

Be wary that comments can be nested. If comments can be nested like in SQL, the basic regex is going to look like this:

/\*.*?\*/

You'll then need to loop until you're stripping nothing.

If, by contrast, comments end on the first */ like in C, you need it greedy with a negative lookahead:

/\*((?!\*/).)*\*/
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
0

I was also needing to ignore lines comments with the form

// blablabla

So, just for if someone also need this, modify the regex by adding the last part |(//.*) so the complete form will be:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)
Nachokhan
  • 81
  • 7