2

Suppose I have this text:

cat file
/* comment */ not a comment /* another comment */

/* delete this  *
/* multiline    *
/* comment      */

/*************
/* and this  *  
/************/
The End

I can use the perl with a conditional ? : to delete only the multiline comment:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)/($1=~qr"\R") ? "" : $1/eg;' file

Prints:

/* comment */ not a comment /* another comment */




The End

Without the conditional:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)//g;' file
 not a comment 




The End

Is there a way to delete only multiline C style comments with a regex only? ie, not use the perl conditional code in the replacement?

dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thanks for that link. Unless I am missing something, it does not answer how to limit the match to only multiline comments. The closest thing I found was [THIS](https://stackoverflow.com/questions/6417436/perl-regex-remove-c-comments-with-specific-keywords) which is essentially the same as my conditional approach. – dawg Dec 22 '21 at 17:47
  • 1
    _"Some people, when confronted with a problem, think_ “I know, I'll use regular expressions.” _Now they have two problems."_ -- [Jamie Zawinski](http://www.jwz.org/) – Jim Garrison Dec 22 '21 at 17:54
  • *"Some people, when confronted with a problem, think* “I know, I'll ask a question on Stack Overflow about a regular expression” *Now they have received a very nice solution."* – The fourth bird Dec 22 '21 at 18:06
  • @sin: very very fair points. I guess I was not trying to write a full comment compiler. C comment are potentially [perverse](https://stackoverflow.com/a/2394918/298607). I am just shooting for the 90% cases. – dawg Dec 22 '21 at 23:03
  • Ok sorry, it looks like it is a script file target. I misread it as C language file. But it doesn't have to be that perverse for C C++ . A simple callback replacement logic is all that's needed since this template maches it all (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*) – sln Dec 23 '21 at 18:12

2 Answers2

4

You can use

perl -0777 -pe 's~/\*(?:(?!\*/|/\*).)*\R(?s).*?\*/~~g' file

The pattern matches

  • /\* - a /* string
  • (?:(?!\*/|/\*).)* - zero or more chars other than line break chars, each of which is not a starting char of a */ and /* char sequences
  • \R - a line break sequence
  • (?s) - now, . will also match line breaks
  • .*? - any zero or more chars as few as possible
  • \*/ - a */ substring.

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
3

With a SKIP/FAIL approach:

perl -0777 -pe's~/\*\N*?\*/(*SKIP)^|/\*.*?\*/~~gs' file

demo

\N matches all that isn't a line-break
The dot matches all characters including newlines since the s flag is used.

The first branch matches "inline" comments, and is forced to fail with ^ (shorter than writing (*F) or (*FAIL) but same result). The (*SKIP) backtracking control verb forces to not retry previous positions, so the next attempts starts after the position of the closing */.

The second branch matches remaining comments that are necessarly multiline.


A shorter variant, with the same two branches but this time using \K to excludes the consumed characters from the match result:

perl -0777 -pe's~/\*\N*?\*/\K|/\*.*?\*/~~gs' file

demo

This time the first branch succeeds, but since all characters before \K are removed from the match result, the remaining empty string is replaced with an empty string.


These two search/replace aren't very different than doing the more portable:

s~(/\*.*?\*/)|/\*[\s\S]*?\*/~$1~g

but with less efforts (no capture group needed, empty replacement string).

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125