4

The issue is I want to match all of the text on either side of the comment and exclude the comment itself.

There are plenty of 'comments' related regex posts, but most are in other languages (I am using notepad++ which wikipedia tells me is POSIX ERE, let's not discuss languages or tools), and most are focused on finding the comments, which I have done already.

This will find the encompassing text I desire (this will include the internal block comment in the match):

(^)rule ((.|\n|\r)*?)(^)end

The above finds anything between 'rule' and 'end', inclusive. Fine.

This will find the block comment:

(?:/\*(?:(?:[^*]|\*(?!/))*)\*/)

The above finds anything between /* and */, inclusive. Fine. I am not concerned if there might be one of the */ inside the comment, not an issue in my case.

Now the question is how do I put the block comment into a negative in the middle of the positive rule match above, so that it matches everything between RULE and END except for commented text?

Bonus points if your answer excludes single line // comments as well.

weberc2
  • 7,423
  • 4
  • 41
  • 57
  • What are you trying to do? Would it be possible to just use your block comment matching pattern and then delete matches? – Michael Myers Jan 16 '13 at 16:59
  • Actually, the latest versions of notepad++ are using Phillip Hazel's powerful and robust [PCRE regex library](http://www.pcre.org/) which provides advanced Perl 5 regex constructs. The info in the Wiki might be a little stale. Regarding your problem at hand - this likely cannot be done with a single regex (depending upon what source code language you are editing). – ridgerunner Jan 16 '13 at 17:10
  • Before posting my thought was to use the positive comment match and eliminate the comments, which is what we will do, since our sample set is small enough. I found similar posts that also said that this situation is bumping up against the limits of what regex can do. Thanks for replies. cheers! – user1984308 Jan 16 '13 at 17:54
  • I'm not familiar with notepad++ but if @ridgerunner is right and you can use PCRE library then you can do this using a single regex. I myself have recently answered 2 questions here with recursive patterns, both using lookaround rules inside. – inhan Jan 16 '13 at 21:56

2 Answers2

1

Let me start by saying: regex is not made to do this!

But it's not impossible: it can be done with a recursive regex:

  • Match everything from "rule" to "end" OR to a start of a comment-block which after further matches everything to "end" OR to a start of a comment-block which after further matches everything to "end" OR etc..

of course only capturing the 'everythings'

Which translates to:

^rule((?:.|\r|\n)*?)(?:^end|(?:(?://$|/\*(?:(?:[^*]|\*(?!/))*)\*/)))
                                                                  ^
                                                             put cursor there
                                                              and insert
                           ((?:.|\r|\n)*?)(?:^end|(?:(?://$|/\*(?:(?:[^*]|\*(?!/))*)\*/)))
                                                                or end with
                           (?:\r?\n^end)

then replace with

$1$2$3$4$..

where the number of substitutions should match the number of recursions

to test the limits of Notepad++ I created this fiddle:

http://jsfiddle.net/lovinglobo/wPKjb/

Notepad++ breaks on more than 29 recursions by simply saying "invalid regular expression".

Lodewijk Bogaards
  • 19,777
  • 3
  • 28
  • 52
0

If you are able to flip your requirement and instead delete all comments from the source, you could use this pattern to match comments (both block and line):

/(\/\*).*?(\*\/)|(\/\/).*?(\n)/s
Daedalus
  • 1,667
  • 10
  • 12