7

What is the right way to check if given line is java code?

Input: LogSupport.java:44 com/sun/activation/registries/LogSupport log (Ljava/lang/String;)V

Expected Output: false.

Input: Scanner in = new Scanner(System.in);

Expected Output: true.

I tried Eclipse JDT ASTParser to check if we can create an AST. Here's the code:

public static boolean isJava(String line) {
    boolean isJava = false;
    ASTParser parser = ASTParser.newParser(AST.JLS3);
    parser.setSource(line.toCharArray());
    parser.setResolveBindings(false);
    ASTNode node = null;

    parser.setKind(ASTParser.K_STATEMENTS);
    try {
        node = parser.createAST(null);
        if (node == null) return false;
        isJava = true;
    } catch (Exception e) {
        return false;
    }
    return isJava;
}

But this does not work. Any ideas? Thanks!

Venkatesh V
  • 77
  • 1
  • 7
  • 2
    What does "this does not work" mean? Do you get an error? Is the output different from what you expect? Maybe the parser expects a complete Java source file and not just a single line (which is not a valid, complete Java source file). – Jesper Apr 23 '15 at 07:25
  • Indeed, Java is not a language compiled and/or interpreted line by line. Take a variable declaration as an example, you can't just build that into an AST as the rules wouldn't be able to determine if it is a local method variable or a class attribute. – Gimby Apr 23 '15 at 07:28
  • Which statement are you parsing? What is there in `line.toCharArray()`? – LittlePanda Apr 23 '15 at 07:28
  • 2
    What do you need to distinguish your code from? E.g., `i++;` is valid for both Java and C++. Would your input contain the whole file or just one line? Your example contains only one line. If it's streamed line by line then is `));` written in Java? And if it follows `f(g(5` on the next line? So you need the whole file or at least a block. What if your input is a part of commented Java code? Should it be considered as valid Java? It can contain anything. – Qualtagh Apr 23 '15 at 07:35
  • The value of node for the malformed input is a block "{ }" i.e., the nodetype is 8 (block). So, if a line has syntactic error, there is no way to distinguish it from a grammatically well-formed line. Yes, line.toCharArray() is the input line I have mentioned. My input will contain only one line. So, it seems like this problem is not solvable and not well defined. – Venkatesh V Apr 23 '15 at 08:56

2 Answers2

4

Try Beanshell

http://www.beanshell.org/intro.html

Java evaluation features:

Evaluate full Java source classes dynamically as well as isolated Java methods, statements, and expressions.

Summary of features

Dynamic execution of the full Java syntax, Java code fragments, as well as loosely typed Java and additional scripting conveniences.

Transparent access to all Java objects and APIs.

Runs in four modes: Command Line, Console, Applet, Remote Session Server. Can work in security constrained environments without a classloader or bytecode generation for most features.

The interpreter is small ~150K jar file.

Pure Java.

It's Free!!

The link below has some other option you could try

Syntax Checking in Java

Community
  • 1
  • 1
Raj
  • 1,945
  • 20
  • 40
1

What you want apparantly is to decide if a string you have is a valid substring of the Java language.

Obviously, to do this, you need a full Java parser as a foundation. Some parsing machinery may let you try parsing the string as a nonterminal in the language; this is relatively easy to do with a recursive descent parser. (It appears the Eclipse parse offers that, based on OP's example).

But if you want to accept an substring (e.g,

        57).x=2; foo[15].bar(abc>=

is a valid Java fragment, you need parsing machinery specialized to handle this.

Our DMS Software Reengineering Toolkit with its Java Front End will do this. The parser APIs provide facilities for "parse a full compilation unit", "parse a nonterminal", and "parse a substring". The first two return trees; the latter returns a sequence of trees. It isn't quite an arbitrary substring; you can't start or end in the middle of token (e.g., a string literal). Other than that, it will parse arbitrary substrings.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341