3

I am looking for a Java library (source code parser) that would help me extract unqualified names of all class names being used in the source code. For example for the given code example:

public class Example {

    private ClassName1;

    protected ClassName2 instance = new ClassName2();

    public Example() {
        ClassName3 test = new ClassName3();
    }

    public doSomething() {
        //ClassName4 test = new ClassName4("SomeExampleString");
        ClassName5 test = new ClassName5("ExampleString2");
    }

}

I need to get the following list:

ClassName1, ClassName2, ClassName3, ClassName5

as this is the list of all names of classes that are being "used" in the source code.

So far I have tried to write a simple parser that would do this for me but is not robust enough to be used in the real world. I have looked into a few Java parsers too, but the problem is that I don't know how this problem would be called to look into their code for a solution, which I believe exists in the domain of existing Java parsers.

So what I am looking for is a Java source parser that would allow me to obtain a class name lists like the one in the example and a short example on how to achieve this or directions where to look for / how this problem is properly called.

NOTE: I am not looking for a method to detect all classes loaded by JVM nor classes in classpath, but a way to detect classes in textual sense by parsing original Java source code that is not compiled.

Klemen
  • 169
  • 1
  • 2
  • 16

1 Answers1

0

If you're just looking for a robust parser, looks like javaparser is pretty good.

You may want to check out this question where the solution is to view all of the classes loaded by the JVM by using the -verbose:class flag. One answer there also mentions using reflection (which was my initial reaction) with this API.

If that question doesn't completely solve your problem (since it would show all classes loaded, "used" or not), and you're not having any luck with reflection, you could try something like this that combines the first solution from there with my idea:

  • Use whatever parser you have to parse tokens in the source code
  • Use the -verbose:class flag when running some main program that instantiates the class in the file you want to check
  • grep whatever tokens your parser tokenized from that output

So, some program Main.java:

public class Main {
    public static void main(String[] args) {
        Example e = new Example();
    }
}

Your (or some other) parser with a main method (psuedo-code):

tokens = parse_tokens()
print "\\\|".join(tokens)

And in bash:

javac *.java
TOKS="$(java MyParser Example.java)"
java -verbose:class Main | grep ${TOKS}

That way, you don't need a robust parser, just something to tokenize Java code. Just a thought, not sure if that would work perfectly or not.

tplusk
  • 899
  • 7
  • 15
  • I am looking for a way to do this without compiling the code, so a Java parser that could return me a list of class names that are referenced in the source code. Right now I have own parser that "tokenizes" code and return me values like [Example, String] for your Main.java file example, but returns things like [e, main, args, public, class, ...] too, so I am looking for a proper Java parser that could do this with only returning the actual class names, not member names, method names, variable names, and other elements too. – Klemen Oct 06 '19 at 14:56