1

What is the regex expression to identify comments (i.e. all characters between /* and */ , including these comment markers themselves, and across multiple lines)?

So for example to pickup:

/* asdf asdf 
asdf asdfasdfasdfasd
asdfasdf
   */
coreyward
  • 77,547
  • 20
  • 137
  • 166
Greg
  • 34,042
  • 79
  • 253
  • 454
  • "including these comment markers themselves" - congratulations, you need to use a context-aware parser instead! – Matt Feb 25 '11 at 01:35
  • Personally, I'd use a parser, but here's a link to regex that I believe will work: http://snipplr.com/view/7129/match-css-and-js-comments/. Link to another similar question: http://stackoverflow.com/questions/3984380/regular-expression-to-remove-css-comments – orangepips Feb 25 '11 at 01:36

3 Answers3

2
(?:/\*(?:(?:[^*]|\*(?!/))*)\*/)

This was originally part of a MySQL parser, designed to strip comments without removing them from strings:

("(?:(?:(?:\\.)|[^"\\\r\n])*)"|'(?:(?:(?:\\.)|[^'\\\r\n])*)'|`(?:(?:(?:\\.)|[^`\\\r\n])*)`)|((?:-- .*)|(?:#.*)|(?:/\*(?:(?:[^*]|\*(?!/))*)\*/))

That gets replaced with capture group 1 to put the strings back.

Patrick
  • 1,766
  • 1
  • 15
  • 27
  • fantastic thank you - I think I'm glad I didn't him persisting trying to work it out myself – Greg Feb 25 '11 at 01:36
  • This won't allow `*/` inside the comment – Matt Feb 25 '11 at 01:37
  • interestly whilst this worked in an online regex tool, when I try to use it in NotePad++ it doesn't work :( – Greg Feb 25 '11 at 01:39
  • @Patrick - clearly, but I'm pointing out the fallacy in OP's requirements: "including these comment markers themselves" - which means this should be allowed: /* bla bla */ bla */ – Matt Feb 25 '11 at 01:40
  • sometimes text editors have their own syntax, like requiring the brackets to be backslashed as well – Patrick Feb 25 '11 at 01:41
  • sorry - I just meant the ones at the extremes, so Patrick's suggestion is fine - however the tool I was going to use it on in windows to do the replacement doesn't seem to work - any ideas for a free Windows regex replacement tool that would respect the regex pattern correctly? – Greg Feb 25 '11 at 01:42
  • Textpad requires you to escape the brackets, I'm not sure about Editpad. – Patrick Feb 25 '11 at 01:44
  • @Greg: "a free Windows regex replacement tool": http://perl.org/get.html You could use `perl -0777 -pe "s{/\*.*?\*/}{}gs" foo.txt` for simple cases http://stackoverflow.com/questions/5112618/what-is-the-regex-expression-to-identify-comments-i-e-between-and-across/5112847#5112847 – jfs Feb 25 '11 at 02:11
2

This is a very difficult problem to solve with a regex (since it is very hard to account for all the edge cases). If this is a programming language that you are parsing I would highly suggest that you use a parser built to parse that language.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
2

It is not that simple e.g.:

/* multiline comment
   f("end marker inside literal string */");
*/

See How do I use a regular expression to strip C style comments from a file?.

jfs
  • 399,953
  • 195
  • 994
  • 1,670