1

I'm trying to create a pattern for negative look behind regEx to not to match certain lines of code in my java file. I could match the phrase with this "(?<=//).*getMessage.*" Above expression matches Line #1 in below code,

// Systme.out.println (obj1.getMessage());  //line1
/* Systme.out.println (obj.getMessage());*/ //line2    
/* public void test() {                     //line3 
   Systme.out.println (obj2.getMessage());  //line4
   }                                        //line5
*/ 
public void test() {                       
      Systme.out.println (obj5.getMessage()); //line 6
 }

But, when i tried negate this using "(?<!//).*getMessage.*", It still matches all the Line #1,#2 and #4 as well.

Actually my requirement is to match the getMessage call at line #6 and ignore other places where getMessage is called inside comments.

It would be great if someone can assist me in finding the right expression.

PS: I can't access java files... I just have to pass the RegEx to a form and select all the java files checkbox.

Burkhard
  • 14,596
  • 22
  • 87
  • 108
  • You'd probably need to use two expressions, especially for finding comments spanning multiple lines, i.e. between `/*` and `*/`. Note that applying a regular expression to a non-regular domain (such as almost every language is) will always be an approximation only. – Thomas Aug 21 '14 at 12:02

3 Answers3

2

I would consider using the following regex to delete all the comments and then use a simple regex to find all getMessage() calls.

REGEX: ~/(?:/.?$|*.?*/)~

String regex = "(?ms)/(?:/.*?$|\*.*?\*/)"

DEMO

Please note the s flag, making the . match newline characters, too. You can use a character with a complemented group instead of the . So that would be something like [\w\W] instead of the second .

skamazin
  • 757
  • 5
  • 12
  • @Thomas I tried to make a regex that would catch if comment was in quotes and **[this](http://regex101.com/r/tN8oQ8/5)** what I got so far. Unfortunately, this means you can't have quotes in a comment at all. So it's far from perfect. I think I need a negative lookbehind, but I haven't fully grasped the syntax for it – skamazin Aug 21 '14 at 15:20
  • @skamazin actually a programming language is a non-regular problem domain and as such regular epxressions have their limitations. To be absolutely sure you'd need to use a more appropriate representation, e.g. an AST. – Thomas Aug 21 '14 at 15:28
  • This regex would delete a wanted `getMessage()` in this scenario: http://regex101.com/r/qP4lK8/1 (edge case I agree, but my answer below handles is correctly) – asontu Aug 25 '14 at 08:38
  • @funkwurm There's bound to be many edge cases I didn't account for. If OP wants to be 100% sure that there are no errors, he should use some predefined parser instead of a single regex. – skamazin Aug 25 '14 at 16:22
1

What if there's a string that contains getMessage? ;)

This is what I would do and then extract every find of group 2:

(['"])(?:(?!\1|\\).|\\.)*\1|\/\/[^\n]*(?:\n|$)|\/\*(?:[^*]|\*(?!\/))*\*\/|(getMessage\(\))

Regular expression visualization

Debuggex Demo

(This is an adaptation of my more general approach I further explain and posted about here: Regex for comments in strings, strings in comments, etc)

Community
  • 1
  • 1
asontu
  • 4,548
  • 1
  • 21
  • 29
1

Aside from using the String literal pattern from skamazin's answer, you can compile it with Pattern modifiers as well:

Pattern regex = Pattern.compile("/(?:/.*?$|\*.*?\*/)", Pattern.MULTILINE | Pattern.DOTALL);

Read more:

Community
  • 1
  • 1
Unihedron
  • 10,902
  • 13
  • 62
  • 72