Testing of a new Java parser

Question

Purely as a self-learning exercise, I'm trying to write a Java parser in Perl using the Parse::RecDescent module. I may later re-implement the parser using other other tools like Antlr, bison, etc.

But how would I ensure that my parser is indeed generating the correct parse, per the Java Language Specification? Meaning, its correct handling of dangling else's, operator-associativity and -precedence etc.

One method would be to compare my parser against a known, bug-free parser by having both parsers generate ASTs for a very large number of test Java programs, and then comparing the two sets of ASTs.

If this is indeed the only method, where could I find a large suite of test Java programs thoroughly covering the entire Java Language Specification?

I have looked at JavaParser but it doesn't seem to have an exhaustive test dataset.

The other method would, of course, be writing by hand tens of thousands test Java programs myself, which would be very impractical for me, not only time-wise but also in ensuring its exhaustiveness!

As the answer already explained, you cannot really test the parser alone, without a simple evaluator. What you can do, though, is to pretty-print your parsed AST into some normalised form (still in Java), and then compare the evaluation results with a reference. You can even repurpose `csmith` to generate an infinite number of valid code snippets for you. — SK-logic, Sep 26 '16 at 11:30

score 6 · Accepted Answer · edited May 23 '17 at 12:33

To decide if you have the right answer, you ideally have to compare to some kind of standard. This is hard for a computer languages.

Comparing ASTs is going to be hard, because there are no standards for such. Each parser that builds ASTs, builds an AST whose structure is designed by the person that coded the parser.

That means if you build an AST-producing parser, and you get somebody else's AST-producing parser, you'll discover that the AST nodes you have chosen don't match the other AST. Now you'll have to build a mapping from your AST to the other one (and how will you know the mapping is valid?). You can try to make your parser generate the AST from another parser, but what you will discover is the AST you produce is influenced by the parsing technology you use.

We have a similar problem with the Java front end my company produces (see bio if you want to know more). What we settle for is testing that the answer is self-consistent and then we do a lot of long-term experiential testing on big pieces of code.

Our solution is to:

(Build a parser, using the strongest parsing technology we can get (GLR). This means we can recognize certain constructs not easily recognized by other parsing technologies (LL, LR, ...), and thus produce AST nodes that other parsers would have a hard time producing. See comments below for an example where this matters. Even so, we produce AST nodes in way that avoids completely our having to hand-code AST node construction as demanded by most other parsing technologies; that tends to produce somewhat different ASTs than hand-coded).
Parse a LOT of Java code (producing ASTs) to make sure we don't have parsing errors. [The JDK is a pretty good size example and it is easy to get]
Our tools can take an AST and regenerate (prettyprint) source code, complete with comments but perhaps a somewhat different layout. We verify that parsed-then-prettyprinted code also parses. We re-prettyprint the parsed prettyprinted version; this should be identical to the prettyprinted version since we always produce the same layout. This test is a good indication that our AST design and implementation doesn't lose anything about the source code
Build symbol tables, resolve the meaning of names, and verify that the legal Java programs type-check according to our front end. That doesn't tell you anything about the nature of the AST except that it is good enough (which in fact, is good enough!) Because the type checking task is very complex (go check your local Java standard), it is also pretty fragile. If you don't have everything right, the type checking will likely fail when applied across a large body of code. Again, the JDK is a pretty good test of this. Note: a Java parser without name and type resolution isn't very useful in practice
Produce JavaDoc like cross references that include hyperlinked source code from the above results. This means it is easy to manually check a bit code to see that name resolution (therefore AST construction) is sane.
Live with the results, applying the front end to various program analysis and transformtions of code. We find the occasional problem and fix it.

It is tough to get this right; you have to get close and keep the testing pressure on continuously, especially since Java the language keeps moving. (We're at Java 8, and Java 9 is being threatened). Bottom line: it is a lot of work to build such a parser and check its sanity.

We'd love to have an independent set of tests, but we haven't seen one in the wild. And I would expect those tests if they exist (I assume Oracle and IBM have them) really don't test parsing and name resolution directly, but rather test that some bit of code compiles and runs producing a known result. Since we aren't building a compiler, we wouldn't be able to run such tests if we had them. We would be able to do the name resolution and type consistency checks and that would be helpful.

[We actually do this for a number of language front ends. You think Java is hard, try this with C++]

1. Though Parse-Trees being grammar dependent would be different across parsers, I wasn't expecting ASTs to be different also! Eg, wouldn't a `while`-statement always be `while(CONDITION) STATEMENTs` across parsers? So, if a parser were to allow a visitor interface, I could make it generate the AST, even pretty-printed down to my comparison needs. 2. I agree, your *Java -> AST -> pretty-printed Java -> AST -> pretty-printed Java* is only a self-consistency check. 3. Wouldn't a *Oracle JDK src -> my AST -> my JDK src -> Oracle JDK compile -> Oracle JDK Unit Testing* provide a better test? — Harry, Sep 26 '16 at 05:40
Clarification: What I meant to say in my comment above was, using LISP notation, wouldn't a `while`-statement's AST **always** be understood by all (bug-free) parsers as `(while CONDITION STATEMENTs)`? And if these parsers provided me hooks to print this AST in my desired format, I could use the testing pipeline mentioned in my bullet #3 above, which would provide something more than a self-consistency check. — Harry, Sep 26 '16 at 05:53
2. Yes, recompiling the prettyprinted JDK and retesting the JDK that would be a better test of parse-AST-prettyprint... if you can get your hands on the JDK test. I have always assumed those tests are just as unavailable as the compiler tests. Is that available somewhere? That still doesn't help with the practical need for name and type resolution to make your parser useful. — Ira Baxter, Sep 26 '16 at 10:38
1. While statements seem like perfect candidates for canonical ASTs. Yet they are not! Consider that Java has *statement_labels*. Whats that AST for " L: while (C) "? You can model this in an AST several different ways. One: "(labelled_while LABELS CONDITION STATEMENTS)"; but now I have decide: is LABELS a nonempty set? If it is, then I have to have "(unlabelled_while CONDITION STATEMENTS)" also as an AST node. Did the designed of the other Java parser make the same choice as you? ... — Ira Baxter, Sep 26 '16 at 10:51
... What if we stick with LABELs as being a possibly-empty set? The other parser might model a label as a *statement*: (label LABEL) and use your simple while (while CONDITION STATEMENTS), modelling a labelled statement as (statementseq,...,(label LABEL),(while ...)). Did you make the same design decision as the other parser? OK, we had trouble with just the easiest case, e.g., while statements. Multiply this by the hundreds of AST node types you might design for your parser, and the chance of lining up with the other parser AST node type set is nil. — Ira Baxter, Sep 26 '16 at 10:55
... I have two explicit examples where parser we build are "different" than others. For JavaScript (and I suspect Ruby) most parsers treat regular expression literals as a blob of characters; our JavaScript/Ruby parsers parse the content of the regexp, too because that is useful program structure; for Ruby, these are really complex. ... — Ira Baxter, Sep 26 '16 at 11:02
... For Fortran, typically the parser being context-free cannot decide how to handle a labelled CONTINUE statement that might be shared across multiple DOs (if you are a FORTRAN person this will make sense). The traditional FORTRAN parsers I have seen thus parse DO and CONTINUE as separate statement thus separate AST nodes, and then resolve DO/CONTINUE matching after parsing. Our parsers are strong enough (GLR) so we can actually resolve the matching at parse time, and can then model this nicely as a single AST node DO BLOCK. Thus, your AST design depends on the technology of your parser. — Ira Baxter, Sep 26 '16 at 11:06
@Harry: You've already read the answer once and so might be tempted to not read it again. Note changes I have made, specifically the addition of new first bullet. — Ira Baxter, Sep 26 '16 at 11:22

score 1 · Answer 2 · answered Oct 28 '16 at 16:54

I agree with what @ira-baxter wrote.

In short, it is easy to verify that your parser can parse all valid source code (just throw at it a lot of code: we did that to test JavaParser, we just did not put in the repository some hundreds of megabytes of source code).

It is hard to verify the AST you produce have the right shape also because you can have many different ways of parsing the same code and building different but equally valid ASTs. Some shapes will be easier to work with w.r.t to others but there is no "truth" to adhere with.

If your tool produce an AST different from the one produced by another parser it does not necessarily mean that any of the two parsers is wrong, it could be just a design choice.

Testing of a new Java parser

2 Answers2