6

I need to build a regular expression that finds the word "int" only if it's not part of some string.

I want to find whether int is used in the code. (not in some string, only in regular code)

Example:

int i;  // the regex should find this one.
String example = "int i"; // the regex should ignore this line.
logger.i("int"); // the regex should ignore this line. 
logger.i("int") + int.toString(); // the regex should find this one (because of the second int)

thanks!

Adibe7
  • 3,469
  • 7
  • 30
  • 36

5 Answers5

4

It's not going to be bullet-proof, but this works for all your test cases:

(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)

It does a look behind and look ahead to assert that there's either none or two preceding/following quotes "

Here's the code in java with the output:

    String regex = "(?<=^([^\"]*|[^\"]*\"[^\"]*\"[^\"]*))\\bint\\b(?=([^\"]*|[^\"]*\"[^\"]*\"[^\"]*)$)";
    System.out.println(regex);
    String[] tests = new String[] { 
            "int i;", 
            "String example = \"int i\";", 
            "logger.i(\"int\");", 
            "logger.i(\"int\") + int.toString();" };

    for (String test : tests) {
        System.out.println(test.matches("^.*" + regex + ".*$") + ": " + test);
    }

Output (included regex so you can read it without all those \ escapes):

(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
true: int i;
false: String example = "int i";
false: logger.i("int");
true: logger.i("int") + int.toString();

Using a regex is never going to be 100% accurate - you need a language parser. Consider escaped quotes in Strings "foo\"bar", in-line comments /* foo " bar */, etc.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

Not exactly sure what your complete requirements are but

$\s*\bint\b

perhaps

MeBigFatGuy
  • 28,272
  • 7
  • 61
  • 66
0

Assuming input will be each line,

^int\s[\$_a-bA-B\;]*$

it follows basic variable naming rules :)

Ratna Dinakar
  • 1,573
  • 13
  • 16
0

If you think to parse code and search isolated int word, this works:

(^int|[\(\ \;,]int)

You can use it to find int that in code can be only preceded by space, comma, ";" and left parenthesis or be the first word of line.

You can try it here and enhance it http://www.regextester.com/

PS: this works in all your test cases.

Sebastiano Merlino
  • 1,273
  • 12
  • 23
0

$[^"]*\bint\b

should work. I can't think of a situation where you can use a valid int identifier after the character '"'. Of course this only applies if the code is limited to one statement per line.

jzilla
  • 1,683
  • 10
  • 10