4

I've used the java compiler tree api to generate the ast for java source files. However, i'm unable to access th comments in the source files.

So far, i've been unable to find a way to extract comments from source file .. is there a way using the compiler api or some other tool ?

Code freak
  • 695
  • 2
  • 8
  • 19

5 Answers5

5

Our SD Java Front End is a Java parser that builds ASTs (and optionally symbol tables). It captures comments directly on tree nodes.

The Java Front End is a member of a family of compiler langauge front ends (C, C++, C#, COBOL, JavaScript, ...) all of which are supported by DMS Software Reengineering Toolkit. DMS is designed to process languages for the purposes of transformation, and thus can capture comments, layout and formats to enable regeneration of code preserving the original layout as much as possible.

EDIT 3/29/2012: (in contrast to answer posted for doing this with ANTLR)

To get a comment on an AST node in DMS, one calls the DMS (lisp-like) function

  (AST:GetComments <node>)

which provide access to the array of comments associated with the AST node. One can inquire about the length of this array (may be null), or for each array element, ask for any of these properties: (AST:Get... FileIndex, Line, Column, EndLine, EndColumn, String (exact Unicode comment content).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • @Code freak - If this is helpful, you should actually give Ira Baxter an upvote, not just comment "+1". – erickson Jun 11 '10 at 17:02
5

The comments obtained through getCommentList method of CompilationUnit will not have the comment body. Also the comments will not be visited, during and AST Visit. Inorder to visit the comments we have call accept method for each comment in the Comment List.

for (Comment comment : (List<Comment>) compilationUnit.getCommentList()) {

    comment.accept(new CommentVisitor(compilationUnit, classSource.split("\n")));
}

The body of the comments can be obtained using some simple logic. In the below AST Visitor for comments, we need to specify the Complied class unit and the source code of the class during initialization.

import org.eclipse.jdt.core.dom.ASTNode;
import org.eclipse.jdt.core.dom.ASTVisitor;
import org.eclipse.jdt.core.dom.BlockComment;
import org.eclipse.jdt.core.dom.CompilationUnit;
import org.eclipse.jdt.core.dom.LineComment;

public class CommentVisitor extends ASTVisitor {

    CompilationUnit compilationUnit;

    private String[] source;

    public CommentVisitor(CompilationUnit compilationUnit, String[] source) {

        super();
        this.compilationUnit = compilationUnit;
        this.source = source;
    }

    public boolean visit(LineComment node) {

        int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
        String lineComment = source[startLineNumber].trim();

        System.out.println(lineComment);

        return true;
    }

    public boolean visit(BlockComment node) {

        int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
        int endLineNumber = compilationUnit.getLineNumber(node.getStartPosition() + node.getLength()) - 1;

        StringBuffer blockComment = new StringBuffer();

        for (int lineCount = startLineNumber ; lineCount<= endLineNumber; lineCount++) {

            String blockCommentLine = source[lineCount].trim();
            blockComment.append(blockCommentLine);
            if (lineCount != endLineNumber) {
                blockComment.append("\n");
            }
        }

        System.out.println(blockComment.toString());

        return true;
    }

    public void preVisit(ASTNode node) {

    }
}

Edit: Moved splitting of source out of the visitor.

Unni Kris
  • 3,081
  • 4
  • 35
  • 57
  • This is really the answer? So, for each comment, you blast the source text into N lines (I suppose you could do this just once for a set) and pick out the Nth? Does that really get the comment, or just the raw line containing the comment? What happens if there are 2 comments in the same line? – Ira Baxter Mar 27 '12 at 07:32
  • You are right, the code is giving you the line which contains the comment. But you can still implement the logic to extract the comment / comments out of that raw line. Having something, is better than nothing. Also to avoid splitting the source everytime, source can be splitted once and the resultant array can be passed to the Visitor. – Unni Kris Mar 28 '12 at 08:04
  • An arcane example: /* c1 * / foo /* c2 */ bar // c3 ... so if I have the tree for bar, how do I know which of these comments apply? – Ira Baxter Mar 29 '12 at 05:46
  • I agree this process is not effective when we have both comments and code on the same line. May be we can do another string mainpulation searching for '//' and '/*' instances in that line, but that now seems highly buggy. – Unni Kris Mar 29 '12 at 09:30
  • Our experience is BIBSEH: "Because In Big Systems, Everything Happens". Agreed this arcane example is likely rare. But, BIBSEH, so it will occur, usually at the most inconvenient momement. – Ira Baxter Mar 29 '12 at 14:30
1

Just for the record. Now with Java 8 you have a whole interface to play with comments and documentation details here.

enissay
  • 300
  • 3
  • 13
0

You might to use a different tool, like ANTLR's Java grammar. javac has no use for comments, and is likely to discard them completely. The parsers upon which tools like IDEs are built are more likely to retain comments in their AST.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • As far as i know Netbeans uses java tree api for syntax highlighting, code completion, checking etc. – Code freak Jun 11 '10 at 08:15
  • Check out its formatter. That's going to use a tree parser to correctly handle the code, but it also formats comments. Also, ANTLR has special support for comment nodes, so I'm pretty sure its Java grammar will handle this case. – erickson Jun 11 '10 at 17:04
0

Managed to solve the problem by using the getsourceposition() and some string manipulation (no regex was needed)

Code freak
  • 695
  • 2
  • 8
  • 19