0

I'm trying to extract a word phrase from a Java source file. For example I have a simple source class

class TestClass implements TestInterface implements TestInterface2 {

}

class TestClass2 {

}

I want to extract the "class TestClass" and "class TestClass2". I have tried different regex patterns but couldn't find a solution

My testing code spinet:-

public static void wordPhraser(String sourceText) {

    Pattern p = Pattern.compile("class(\\s+)([a-zA-Z]*)");
    Matcher m = p.matcher(sourceText);
    while (m.find()) {
        System.out.println("output " + m.group());
    }
}

Also tried:-

"class\\s*([a-zA-Z])"
"class\\s*[a-zA-Z]"
"^class\\s+[a-zA-Z]$"

Non of these are working.

Thanks.

Switch
  • 14,783
  • 21
  • 69
  • 110

2 Answers2

2

Here is the regex I use:

(final|abstract|\n|^) {0,}class {1,}.{1,} {0,}\\{

That will get the test including the implements/interfaces too though. Here's the code I use to parse them out, and just get the classname:

        String match = m.group();//m is my matcher for the regex
        String s = match.substring(match.indexOf("class ") + "class ".length(), match.lastIndexOf("{")).trim();
        if(s.contains("extends"))
            s=s.substring(0, s.indexOf("extends"));
        if(s.contains("implements"))
            s=s.substring(0, s.indexOf("implements"));
        s=s.trim();
        strings.add(s);

NOTE: This won't work with public or private classes, only those with simply final/abstract modifiers

Alex Coleman
  • 7,216
  • 1
  • 22
  • 31
2

I'm afraid to say that they work, but there is room for improvement:

\bclass(\s+)([a-zA-Z_]\w*)\b

Is a better regex. You weren't matching numbers.

For sure, this is how you should use it in Java:

String regex = "\\bclass(\\s+)([a-zA-Z_]\\w*)\\b";

To match more:

\b((public|private|protected|static|abstract|final)\s*)*class(\s+)([a-zA-Z_]\w*)\b

Demo:

enter image description here

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • 2
    To be technically correct, java names can't start with a number, and you need at least one char (your regex matches 0-n chars), and underscore is a valid char. This is accurate: `[a-zA-Z_]\\w*` – Bohemian Jul 24 '12 at 19:12
  • @Bohemian: Yes, indeed, thanks. I edited the answer. But I suppose that the source file is actually compilable :D. For example it is also impossible that `public` and `private` are used together, or that `abstract` and `final` are combined, but it will match as well. However, it looks like the OP got helped :) – Martijn Courteaux Jul 24 '12 at 19:16