0

I'm running into the problem of finding a searched pattern within a larger pattern in my Java program. For example, I'll try and find all for loops, but will stumble upon formula. Most of the suggestions I've found talk about using regular expression searches like

String regex = "\\b"+keyword+"\\b";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(searchString);

or some variant of this. The issue I'm running into is that I'm crawling through code, not a book-like text where there are spaces on either side of every word. For example, this will miss for(, which I would like to find. Is there another clever way to find whole words only?

Edit: Thanks for the suggestions. How about cases in which there the keyword starts on the first entry of the string? For example,

class Vec {
public:
   ...
};

where I'm searching for class (or alternatively public). The patterns suggested by Thanga, Austin Lee, npinti, and Kai Iskratsch do not work in this case. Any ideas?

mjswartz
  • 715
  • 1
  • 6
  • 19

4 Answers4

2

In your case, the issue is that the \b flag will look for punctuation marks, white spaces and the beginning or end of the string. An opening bracket does not fall within any of these categories, and is thus omitted.

The easiest way to fix this would be to replace "\\b"+keyword+"\\b" with "[\\b(]"+keyword+"[\\b)]".

In regex syntax, the square brackets denote a set of which the regex engine will attempt to match any character it contains.

As per this previous SO question, it would seem that \b and [\b] are not the same. Whilst \b represents a word boundary, [\b] represents a backspace character. To fix this, simply replace "\\b"+keyword+"\\b" with "(\b|\()"+keyword+"(\b|\))".

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96
  • i think you will have to escape the brakets too. also depending on what language the program code is you will have to add more exceptions. an option could be [\\b\\W]+keyword+[\\b\\W] (\w matches any non word character) – Kai Iskratsch Jan 07 '16 at 14:59
  • No need of `[\b(]` as `(` is not considered a word character. – anubhava Jan 07 '16 at 15:02
1

Regex should match 0 or more chars. The below code change will fix the issue

String regex = ".*("+keyword+").*";
Thanga
  • 7,811
  • 3
  • 19
  • 38
0

You could modify your regex to search for multiple characters afterwords, for example [^\w]+"for"+[^\w] using the Pattern class in Java.

For your reference: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

0

Basically you will have to adapt your regex to all the possible patterns it can find. But considering your actually dealing with code, you are better of building a parser/tokenizer for that language, or using one that already exists. Then all you have to do is run through the tokens to find the the ones you want.