2

do you know of any tool which creates an AST from a Java program or class and creates an XML representation (Collection or single XML document) from the AST?

kind regards,
Johannes

Johannes
  • 2,021
  • 2
  • 23
  • 40

2 Answers2

2

Not any tools directly, but http://www.antlr.org/ is the defacto tool for building ASTs from any general language. And there exists several grammar files for Java that you can repurpose for your own programs. So grab ANTLR, use the latest Java grammer, and write out the XML representation you want.

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138
2

Our DMS Software Reengineering Toolkit with its Java Front End can do this directly. You ask DMS to parse the file, and produce an XML dump using a command line switch ++XML.

See What would an AST (abstract syntax tree) for an object-oriented programming language look like?.

As a general rule, we don't recommend this, for several reasons:

  • XML output for real files is really enormous, and takes a lot of time to write and read

  • Most people do this because they believe with an XML representation that just a little bit of XSLT will get them what they want

  • If you intend to modify the code, once you have the XML you pretty much can't regenerate it.

  • The machinery that DMS provides (attribute grammars, symbol tables, flow analyses, pattern matching and source-to-source transformations, source regeneration from the AST, is what you really want, and you get access to it by using DMS after the parsing step without exporting the XML ever

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • I disagree -- if you accept XML at all, it certainly is an option for AST representation in some cases, and indeed a little bit of XSLT, or XQuery, might just do what you want on that. Ideally the XML representation of the AST is done as markup to the original input, so you can completely regenerate by simply getting the string value. Size and processing time may be acceptable at times. I would not recommended it for each and every parsing job, but would not disqualify it either. – Gunther Oct 25 '11 at 22:03
  • @Guthner: If it works for you, fine. My experience is that unless you want to do something really simple (e.g., only about simple syntax at best), this won't work for you. It fails big time when you need any information about symbols, or if you have a lot of code to process. – Ira Baxter Oct 25 '11 at 22:10
  • Yeah I maybe want to visualize the evolution of the AST to some extend, just to demonstrate a simple use case of a hierarichal visualization tool. I think I would have to use `ANTLR` as `DMS` most probably isn't open source, and we are researching in the field of versioned tree-structured data and `secure` cloud storage thereof. I'm currently using FSMap to map some directories from my Desktop into an XML representation with some metadata, but the evolution of ASTs might be also interesting. But maybe it's too low level for developers and a simple line based diff is sufficient, I don't know. – Johannes Oct 31 '11 at 02:17
  • Well, ANTLR may produce an AST. If *all* you want is Syntax, ANTLR might be fine. My experience is that you can't do much with a programming language without knowing what the definitions and types are, and thats's lot of work that I don't think you will get OTS with an ANTLR grammar for Java. – Ira Baxter Oct 31 '11 at 02:38
  • .... if you really want to compare the evolution of the code (nobody cares about the evolution of the ASTs), you might want a look at our SmartDifferences http://www.semanticdesigns.com/Products/SmartDifferencer These are (DMS-based) tool that compare two versions of a program by comparting their ASTs, but reporting the difference in terms of the source text. – Ira Baxter Oct 31 '11 at 02:39