1

I'm refactorying a very big C project and I need to find out some part of code written by specific programmer. Fortunately every guy involved in this project mark his own code using his email address in standard C style comments.

Ok, someone could say that this could be achieved easily with a grep from command line, but this is not my goal: I may need to remove this comments or substitute them with other text so regex is the only solution.

Ex.

/*********************************************
 *
 * ... some text ....
 *
 * author: user@domain.com
 *
 *********************************************/

From this post I found the right expression to search for C style comments which is:

\/\*(\*(?!\/)|[^*])*\*\/

But that is not enough! I only need the comments which contains a specific email address. Fortunately the domain of email address I'm looking for seems to be unique in the whole project so this could make it simpler.

I think I must use some positive lookahead assertion, I've tried this one:

(\/\*)(\*(?!\/)|[^*](?=.*domain.com))*(\*\/)

but it doesn't run! Any advice?

Community
  • 1
  • 1
Sbraaa
  • 55
  • 8
  • [`\/\*[^*]*(?:\*(?!\/)[^*]*)*@domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/`](https://regex101.com/r/nW8uP2/1)? – Wiktor Stribiżew May 22 '16 at 13:10
  • `\/\*.*author: .*@domain\.com.*?\*\/` should match. – Saleem May 22 '16 at 13:14
  • @Saleem, [it will overfire](https://regex101.com/r/mV2bU2/1), do not rely on `.*` when you deal with matching inside a marked up text. – Wiktor Stribiżew May 22 '16 at 13:17
  • @WiktorStribiżew you are correct. One can easily make a mistake and it will overfire. but If you can see, I'm using `.*?` (non-greedy) at end. just before `*/` here is my test case. https://regex101.com/r/mV2bU2/2 – Saleem May 22 '16 at 13:19
  • as @WiktorStribiżew pointed out, my regex can potentially overfire so here is another version with fix `\/\*.*author: .*@domain\.com.*?\*\/\s` see https://regex101.com/r/mV2bU2/3 – Saleem May 22 '16 at 13:26
  • Why do you dismiss grep as a tool of choice? It's regex based after all and it would give you a survey of where the author's email occurs (eg. in string literals ). To actually substitute content, you might use the cli tool sed, who matches lines by regex too. – collapsar May 22 '16 at 13:38
  • @WiktorStribiżew : Thank you! your solution runs very well! – Sbraaa May 22 '16 at 14:07
  • @collapsar : I could use grep + sed but I guess it is simpler using regex multiline search/replace feature provided by some code editors – Sbraaa May 22 '16 at 14:18
  • Stackoverflow is _not_ a forum. Read the [faq] and note that we do not put [solved] and other stuff in the title. There is a checkmark next to each answer that you're supposed to use to indicate which answer solved your problem. – dandan78 Jun 22 '16 at 08:05
  • You're right, sorry for the mistake. However, how can I mark an answer when users reply with comments instead of using regular answer? Have I missed something? – Sbraaa Jun 22 '16 at 15:27

1 Answers1

2

You can use

\/\*[^*]*(?:\*(?!\/)[^*]*)*@domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/

See the regex demo

Pattern details:

  • /\* - comment start
  • [^*]*(?:\*(?!\/)[^*]*)* - everything but */
  • @domain\.com - literal domain.com
  • [^*]*(?:\*(?!\/)[^*]*)* - everything but */
  • \*\/ - comment end

A faster alternative (as the first part will be looking for everything but the comment end and the word @domain):

\/\*[^*@]*(?:\*(?!\/)[^*@]*|@(?!domain\.com)[^*@]*)*@domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/

See another demo

In these patterns, I used an unrolled construct for (\*(?!\/)|[^*])*: [^*]*(?:\*(?!\/)[^*]*)*. Unrolling helps construct more efficient patterns.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563