4

I need to get all names of the classes and some other infos in a java project. Read all the text in .java files, and use Regex expression to get :

  • class: name, modifier, attributes, method
  • attributes of class: name, modifier, type
  • method: name, modifier, return type

Here a project example:

public abstract class Shape {
    int numberOfSides;
    protected abstract double calculateArea();
    public final int getNumberOfSides() {
        return numberOfSides;
    }
}
public class Circle extends Shape{

    int radius;
    public Circle(int radius) {
        this.radius = radius;
    }

    @Override
    protected double calculateArea() {
        // TODO Auto-generated method stub
        return Math.PI*radius*radius;
    }
}
public class Square extends Shape{
    int sideLength;
    Square() {
        numberOfSides = 4;
    }
    public void setSideLengt(int sideL) {
        sideLength = sideL;
    }
    @Override
    protected double calculateArea() {
        // TODO Auto-generated method stub
        return this.sideLength*this.sideLength;
    }   
}

I used this regex: "class\\s*(?<className>[a-zA-Z][a-zA-Z\\d_\\$]*)" to get the class name. But getting other info is so difficult. Can you help me??

Son Ruler
  • 41
  • 1
  • and .getClass() wasn't sufficient for you? – Stultuske Nov 28 '18 at 10:56
  • 2
    Not an answer, but I'll eat this laptop if there isn't already some API which can do this for you. – Tim Biegeleisen Nov 28 '18 at 10:56
  • 3
    You could use something like [ANTLR](https://www.antlr.org/) which can parse text based on [grammars](https://github.com/antlr/grammars-v4), on that repository some java grammars are already defined. Might be overkill though. If you can just load the `.java` file you can use reflections. – Mark Nov 28 '18 at 10:58
  • 1
    Try Enterprise Architect to import your code. Or write a Java compiler yourself. – qwerty_so Nov 28 '18 at 11:24
  • 1
    use a Java Parser - as an example see the Java module of my open source project SimpleGraph http://wiki.bitplan.com/index.php/SimpleGraph-Java and the testcase https://github.com/BITPlan/com.bitplan.simplegraph/blob/master/simplegraph-java/src/test/java/com/bitplan/simplegraph/java/TestJavaSystem.java – Wolfgang Fahl Nov 28 '18 at 15:11
  • https://en.m.wikipedia.org/wiki/Recursive_descent_parser if you’re interested in implementing something yourself. You could knock one up for the simple syntax your looking for fairly quickly. Maybe combine with a SAX type listener pattern to pick up what the parser finds. – muszeo Nov 30 '18 at 07:04

1 Answers1

1

Yes, it is much more difficult.

First, you need to aknowledge that your regex oversimplifies the problem. Look at the snippet below:

class test {
   /* class is important */ 
   class smurf {
   }
}
class test3 {
}

Your expression will indeed find all the classes. But it will not notice the nesting relation between test and smurf. Worse, it will also find classes that do not exist, like the false positive in the comments. Finally, your regex doesn't give you any inheritance relationships.

Now imagine that you could define a great regex expression to detect member definitions: How would you then make the difference between members of test and members of smurf ? So your parsing logic would need to keep track in which class you are parsing what.

Very quickly, you'll experience problems parsing the method parameters and the generic parameters, since you would need to add to your parsing vocabulary the new types that were defined somewhere else.

So in the end, to solve your problem, you will need a real java language parser, as complex as the one from your compiler. Or you may use existing tools instead of reinventing your own.

Christophe
  • 68,716
  • 7
  • 72
  • 138