42

As far as I know, the only way to parse Java source-code into an AST (Abstract Syntax Tree) is to use the Java Compiler Tree API: com.sun.source.tree

I have two questions:

  1. What JDKs support com.sun.source.tree?
  2. Is there a portable replacement that works for all JDKs?
Gili
  • 86,244
  • 97
  • 390
  • 689
  • If I'm not mistaken, Eclipse uses a different version of the Java model with their own parser, and there might be a way to reuse that for general parsing. – Uri Dec 28 '09 at 04:41
  • 1
    What do you mean by "support" in your first question? Are you asking which versions of Java from which vendors contain the com.sun.source.tree package? I would imagine only Sun's does. If you want to parse source code with another JDK (say, IBM's), then a standalone parser library is probably necessary. – Brett Daniel Dec 28 '09 at 04:49
  • @Brett, I know that com.sun.source.tree was only introduced in JDK6. I'm wondering if all non-Sun JDKs support this API. – Gili Dec 28 '09 at 04:51
  • com.sun is not portable. It may exist in other JDKs but do not count on it. – TofuBeer Dec 28 '09 at 04:56

6 Answers6

27

Regarding your second question, there are dozens of Java parsers available in addition to Sun's. Here is a small sample:

  • Eclipse's org.eclipse.jdt.core.dom package.
  • Spoon outputs a very nice annotated parse tree with type information and variable binding (and uses Eclipse's parser internally)
  • ANTLR is a parser-generator, but there are grammars for Java available
  • javaparser (which I have not used)

My best advice is to try each of them to see which works best for your needs.

rmtheis
  • 5,992
  • 12
  • 61
  • 78
Brett Daniel
  • 2,001
  • 16
  • 17
  • What is the difference between Eclipse's jdt DOM and Spoon? – Gili May 30 '10 at 00:35
  • 1
    The actual AST classes are roughly analogous, but Spoon's parse tree includes semantic information like variable binding without requiring a massive IDE infrastructure to be running. One can parse and analyze Java files by simply adding one jar file to the classpath. – Brett Daniel May 31 '10 at 06:56
9

You can possibly take the tools.jar and use it. javac is open source so you can just grab that code (assuming you can deal with the license). Antlr has grammars for Java as well.

TofuBeer
  • 60,850
  • 18
  • 118
  • 163
  • Redistributing tools.jar: good point! OpenJDK's classpath exception makes for a great license. – Gili Dec 28 '09 at 05:00
  • google-java-format uses `com.google.errorprone:javac-shaded` to get the AST. `javac-shaded` embeds OpenJDK parser in itself. Example can be found at `JavaInputAstVisitor.java` in google-java-format. – Winter Young Sep 23 '17 at 08:19
  • 1
    Entry point is [here](https://github.com/openjdk/jdk/blob/739769c8fc4b496f08a92225a12d07414537b6c0/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/ParserFactory.java#L53) – polkovnikov.ph May 14 '22 at 12:16
7

I've used Eclipse's AST parser. I found it to be pretty good (well it was part of an Eclipse plug-in so it did make sense to use it). See Exploring Eclipse's ASTParser.

Jeremy Raymond
  • 5,817
  • 3
  • 31
  • 33
  • Here is last WebArchive for the link above. https://web.archive.org/web/20090801122725/http://www.ibm.com/developerworks/opensource/library/os-ast/ – wviana May 03 '22 at 14:43
3

A working, simple to use Java Parser is... JavaParser. The project has been active for some years already. While it was initially hosted on Google code it is now available on GitHub: https://github.com/javaparser/javaparser

It is quite simple to use and the number of dependencies is small. It is also available on Maven.

It has been used for a few years, so it works quite well and permits to parse also comments, to change the AST and regenerate the code.

Alexey Grigorev
  • 2,415
  • 28
  • 47
Federico Tomassetti
  • 2,100
  • 1
  • 19
  • 26
1

I've just come across Jexast, an extraction of the JDT's ASTParser to work independent of Eclipse (it depends on org.eclipse.jdt.internal.compiler.**).

I haven't tried it yet, but it does seem interesting.

Esteban Küber
  • 36,388
  • 15
  • 79
  • 97
1

It is not the only way.

See our Java Front End, which is a full featured Java parser built on top of the DMS Software Reengineering Toolkit. It parses Java, and builds ASTs as internal data structures.

The point of DMS is that it provides a huge variety of additional useful machinery (attribute grammars, symbol tables, flow analysis, AST manipulation including access and update, as well as source-to-source transformations) to analyze and transform that AST into results and/or modified source code. If you get "just" a Java parser (e.g., JavaCC + Java grammar) you will, IMHO, not be able to do a lot with it. DMS makes it possible to do a lot, without having to invent all that extra machinery yourself.

If you really don't want to use the extra machinery DMS provides, it will dump the tree as XML.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341