4

In certain problem I need to parse a Java source code fragment that is potentially incomplete. For example, the code can refer to variables that are not defined in such fragment.

In that case, I would still like to parse such incomplete Java code, transform it to a convenient inspectable representation, and being able to generate source code from such abstract representation.

What is the right tool for this ? In this post I found suggestions to use Antlr, JavaCC or the Eclipse JDT. However, I did not find any reference regarding dealing with incomplete Java source code fragments, hence this question (and in addition the linked question is more than two years old, so I am wondering if something new is on the map).

As an example, the code could be something like the following expression:

"myMethod(aVarName)"

In that case, I would like to be able to somehow detect that the variable aVarName is referenced in the code.

Community
  • 1
  • 1
Sergio
  • 8,532
  • 11
  • 52
  • 94
  • I'm not sure but I don't think that this is possible to do in Java. I could definitely be wrong though – ghostbust555 Aug 04 '13 at 20:04
  • Should it be an automatical or manual process? In the last case, I would recommend to use Eclipse and its Quick fixes to complete the source-code. – t777 Aug 04 '13 at 20:17
  • it should be automatic – Sergio Aug 04 '13 at 20:30
  • Why is *parsing* Java with undefined variables names hard? All you need is a Java grammar and parsing engine; there are plenty of those around even off the shelf. What is it that you want to do, for which "undefined names" is a problem? If the code in syntactically malformed, that's a different problem. – Ira Baxter Aug 04 '13 at 23:34

4 Answers4

6

Uhm... This question does not have anything even vaguely like a simple answer. Any of the above parser technologies will allow you to do what you wish to do, if you write the correct grammar and manipulate the parser to do fallback parsing unknown token passover sort of things.

The least amount of work to get you where you're going is either to use ANTLR which has resumable parsing and comes with a reasonably complete java 7 grammar, or see what you can pull out of the eclipse JDT ( which is used for doing the error and intention notations and syntax highlighting in the eclipse IDE. )

Note that none of this stuff is easy -- you're writing klocs, not just importing a class and telling it to go.

At a certain point of incorrect/incompleteness all of these strategies will fail just because no computer ( or even person for that matter ) is able to discern what you mean unless you at least vaguely say it correctly.

lscoughlin
  • 2,327
  • 16
  • 23
  • I see. Do you think that I should be able to use for my purposes the Eclipse JDT as a stand-alone library ? (I mean, without having to start a full Eclipse instance for that ?) – Sergio Aug 04 '13 at 20:32
  • The short answer is yes, but again, it won't be a simple process as you'll have to figure out which set of JAR files you'll need to extract and puzzle out how to use them. You'd probably be better off with Xtend if you're married to the eclipse tool chain. Both Xtend and ANTLR have far better general documentation. There is also a tool called spoon which might get you closer to your target: http://spoon.gforge.inria.fr/ – lscoughlin Aug 04 '13 at 20:38
  • so it seems that JDT is not the best idea for this. Do you think that JavaCC maybe an option ? I do not get why you suggested Xtend, in its web page it looks like a general purpose language, does it provide any facility for obtaining a convenient representation of incomplete Java source code ? – Sergio Aug 04 '13 at 20:59
3

If you just want basic parsing - an undecorated AST - you can use existing Java parsers. But from your question I understand you're interested in deeper inspection of the partial code. First, be aware the problem you are trying to solve is far from simple, especially because partial code introduces a lot of ambiguities.

But there is an existing solution - I needed to solve a similar problem, and found that a nice fellow called Barthélémy Dagenais has worked on it, producing a paper and a pair of open-source tools - one based on Soot and the other (which is generally preferable) on Eclipse. I have used both and they work, though they have their own limitations - don't expect miracles.

Here's a direct link to a quick tutorial on how to start with the Eclipse-based tool.

Oak
  • 26,231
  • 8
  • 93
  • 152
3

I needed to solve a similar problem in my recent work. I have tried many tools, including Eclipse JDT ASTParser, python javalang and PPA. I'd like to share my experience. To sum up, they all can parse code fragment to some extent, but all failed to parse occasionally when the code fragment is too ambiguous.

  • Eclipse JDT ASTParser

Eclipse JDT ASTParser is the most powerful and widely-used tool. This is a code snippet to parse the method invocation node.

ASTParser parser = ASTParser.newParser(AST.JLS8);
parser.setResolveBindings(true);
parser.setKind(ASTParser.K_STATEMENTS);
parser.setBindingsRecovery(true);
Map options = JavaCore.getOptions();
parser.setCompilerOptions(options);
parser.setUnitName("test");

String src = "System.out.println(\"test\");";
String[] sources = { };
String[] classpath = {"C:/Users/chenzhi/AppData/Local/Programs/Java/jdk1.8.0_131"};

parser.setEnvironment(classpath, sources, new String[] { }, true);
parser.setSource(src.toCharArray());
final Block block = (Block) parser.createAST(null);
block.accept(new ASTVisitor() {
    public boolean visit(MethodInvocation node) {
        System.out.println(node);
        return false;
    }
});

You should pay attention to parser.setKind(ASTParser.K_STATEMENTS), this is setting the kind of constructs to be parsed from the source. ASTParser defines four kind (K_COMPILATION_UNIT, K_CLASS_BODY_DECLARATIONS, K_EXPRESSION, K_STATEMENTS), you can see this javadoc to understand the difference between them.

  • javalang

javalang is a simple python library. This is a code snippet to parse the method invocation node.

src = 'System.out.println("test");'
tokens = javalang.tokenizer.tokenize(code2)
parser = javalang.parser.Parser(tokens)
try:
    ast = parser.parse_expression()
    if type(ast) is javalang.tree.MethodInvocation:
        print(ast)
except javalang.parser.JavaSyntaxError as err:
    print("wrong syntax", err)

Pay attention to ast = parser.parse_expression(), just like the parser.setKind() function in Eclipse JDT ASTParser, you should set the proper parsing function or you will get the 'javalang.parser.JavaSyntaxError' exception. You can read the source code to figure out which function you should use.

  • PPA

Partial Program Analysis for Java (PPA) is a static analysis framework that transforms the source code of an incomplete Java program into a typed Abstract Syntax Tree. As @Oak said, this tool came from academy.

PPA comes as a set of Eclipse plug-ins which means it need to run with Eclipse. It has provided a headless way to run without displaying the Eclipse GUI or requiring any user input, but it is too heavy.

String src = "System.out.println(\"test\");";
ASTNode node = PPAUtil.getSnippet(src, new PPAOptions(), false);

// Walk through the compilation unit.
node.accept(new ASTVisitor() {
    public boolean visit(MethodInvocation node) {
        System.out.println(node);
        return false;
    }
});
coder.chenzhi
  • 171
  • 1
  • 11
2

Eclipse contains just that: a compiler that can cope with incomplete java code (basically, that was one reason for these guys to implement an own java-compiler. (See here for better explanation)

There are several tutorials that explain the ASTParser, here is one.

Community
  • 1
  • 1
wrm
  • 1,898
  • 13
  • 24