20

Can someone provide a detailed example as to how I can do this using antlr4? Instructions right from installing antlr4 and its dependencies would be highly appreciated.

user3266901
  • 293
  • 1
  • 2
  • 8
  • I can't give you a detailed example, but if you find some java grammar for antlr4, you can use the new antlr4 features (visitors generation). It's well described in their excellent book. You could start from here https://github.com/antlr/grammars-v4/tree/master/java – Leo Feb 03 '14 at 18:06
  • 1
    But there aren't well documented examples for newbies like me. I know how to download the Java.g4 grammar and create the Tokens etc. But I don't have a clue as to what I should do after that. I reckon that a complete detailed example would help me and many other people. – user3266901 Feb 03 '14 at 18:16
  • of course there is! :-) http://leonotepad.blogspot.com.br/2014/01/playing-with-antlr4-primefaces.html – Leo Feb 03 '14 at 18:18
  • 1
    this book is also great and it can be understood by newbies (like you and me) http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference – Leo Feb 03 '14 at 18:20
  • Thanks Leo; added your article to antlr4 wiki home: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Articles+and+Resources – Terence Parr Feb 03 '14 at 18:27

3 Answers3

29

Here it is.

First, you're gonna buy the ANTLR4 book ;-)

Second, you'll download antlr4 jar and the java grammar (http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference)

Then, you can change the grammar a little bit, adding these to the header

    (...)
grammar Java;

options 
{
    language = Java;
}

// starting point for parsing a java file
compilationUnit
    (...)

I'll change a little thing in the grammar just to illustrate something.

/*
methodDeclaration
    :   (type|'void') Identifier formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;
*/
methodDeclaration
    :   (type|'void') myMethodName formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;

myMethodName
    :   Identifier
    ;

You see, the original grammar does not let you identify the method identifier from any other identifier, so I've commented the original block and added a new one just to show you how to get what you want.

You'll have to do the same for other elements you want to retrieve, like the comments, that are currently being just skipped. That's for you :-)

Now, create a class like this to generate all the stubs

package mypackage;

public class Gen {

    public static void main(String[] args) {
        String[] arg0 = { "-visitor", "/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/Java.g4", "-package", "mypackage" };
        org.antlr.v4.Tool.main(arg0);
    }

}

Run Gen, and you'll get some java code created for you in mypackage.

Now create a Visitor. Actually, the visitor will parse itself in this example

package mypackage;

import java.io.FileInputStream;
import java.io.IOException;

import mypackage.JavaParser.MyMethodNameContext;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

/**
 * @author Leonardo Kenji Feb 4, 2014
 */
public class MyVisitor extends JavaBaseVisitor<Void> {

    /**
     * Main Method
     * 
     * @param args
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        ANTLRInputStream input = new ANTLRInputStream(new FileInputStream("/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/MyVisitor.java")); // we'll
                                                                                                                                                    // parse
                                                                                                                                                    // this
                                                                                                                                                    // file
        JavaLexer lexer = new JavaLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaParser parser = new JavaParser(tokens);
        ParseTree tree = parser.compilationUnit(); // see the grammar ->
                                                    // starting point for
                                                    // parsing a java file



        MyVisitor visitor = new MyVisitor(); // extends JavaBaseVisitor<Void>
                                                // and overrides the methods
                                                // you're interested
        visitor.visit(tree);
    }

    /**
     * some attribute comment
     */
    private String  someAttribute;

    @Override
    public Void visitMyMethodName(MyMethodNameContext ctx) {
        System.out.println("Method name:" + ctx.getText());
        return super.visitMyMethodName(ctx);
    }

}

and that's it.

You'll get something like

Method name:main
Method name:visitMyMethodName

ps. one more thing. While I was writing this code in eclipse, I've got a strange exception. This is caused by Java 7 and can be fixed just adding these parameters to your compiler (thanks to this link http://java.dzone.com/articles/javalangverifyerror-expecting)

enter image description here

Leo
  • 6,480
  • 4
  • 37
  • 52
  • In ANTLR 4, never use the `@header` action to set the package. Instead, use `-package {package}` on the command line when you generate code. – Sam Harwell Feb 04 '14 at 12:26
  • Also, you don't need to create a new instance of `ParseTreeWalker`. If you aren't using a custom walker, use `ParseTreeWalker.DEFAULT.walk(listener, tree)` instead. – Sam Harwell Feb 04 '14 at 12:27
  • 1
    Also, why on earth are you using your own `Gen` class? Use one of the Ant or Maven build scripts instead. – Sam Harwell Feb 04 '14 at 12:28
  • 1
    the Gen class I'll keep, because it's easier than having to explain ant or maven to a newbie when a 1-line command does the job ;-) – Leo Feb 04 '14 at 12:35
  • I'm getting an error while creating the MyVisitor class. [Check this link](http://imgur.com/5xWrixt) and compiling Gen.java didn't produce any errors but I didn't get any stubs or java source code. – user3266901 Feb 04 '14 at 13:47
  • you must compile MyVisitor only after Gen generates the classes – Leo Feb 04 '14 at 13:59
  • How do I install the antrl4 jar? It's giving me an error when I try to run Gen.java – user3266901 Feb 07 '14 at 11:39
  • get from here http://www.antlr.org/download/antlr-4.2-complete.jar – Leo Feb 07 '14 at 11:40
  • Ok done. Thanks a lot for your help by the way. I was totally lost. – user3266901 Feb 07 '14 at 11:56
  • try ParseTree tree = parser.compilationUnit(); – Leo Feb 07 '14 at 11:59
  • ANTLRInputStream is deprecated. Use [CharStreams](http://www.antlr.org/api/Java/org/antlr/v4/runtime/CharStreams.html) to create a [CharStream](http://www.antlr.org/api/Java/org/antlr/v4/runtime/CharStream.html) instance – David Bradley Nov 23 '17 at 16:04
2
grammar Criteria;

@parser::header {
  import java.util.regex.Pattern;
}

options
{
  superClass = ReferenceResolvingParser;
}

@parser::members {

  public CriteriaParser(TokenStream input, Object object) {
    this(input);
    setObject(object);
  }

}

/* Grammar rules */

reference returns [String value]
          : '$.' IDENTIFIER { $value = resolveReferenceValue($IDENTIFIER.text); }
          ;

operand returns [String value]
        : TRUE { $value = $TRUE.text; }
        | FALSE { $value = $FALSE.text; }
        | DECIMAL { $value = $DECIMAL.text; }
        | QUOTED_LITERAL  { $value = $QUOTED_LITERAL.text.substring(1, $QUOTED_LITERAL.text.length() - 1); }
        | reference { $value = $reference.value; }
        ;

operand_list returns [List value]
             @init{ $value = new ArrayList(); }
             : LBPAREN o=operand { $value.add($o.value); } (',' o=operand { $value.add($o.value); })* RBPAREN
             ;

comparison_expression returns [boolean value]
                      : lhs=operand NEQ rhs=operand { $value = !$lhs.value.equals($rhs.value); }
                      | lhs=operand EQ rhs=operand { $value = $lhs.value.equals($rhs.value); }
                      | lhs=operand GT rhs=operand { $value = $lhs.value.compareTo($rhs.value) > 0; }
                      | lhs=operand GE rhs=operand { $value = $lhs.value.compareTo($rhs.value) >= 0; }
                      | lhs=operand LT rhs=operand { $value = $lhs.value.compareTo($rhs.value) < 0; }
                      | lhs=operand LE rhs=operand { $value = $lhs.value.compareTo($rhs.value) <= 0; }
                      ;

in_expression returns [boolean value]
              : lhs=operand IN rhs=operand_list { $value = $rhs.value.contains($lhs.value); };

rlike_expression returns [boolean value]
                 : lhs=operand RLIKE rhs=QUOTED_LITERAL { $value = Pattern.compile($rhs.text.substring(1, $rhs.text.length() - 1)).matcher($lhs.value).matches(); }
                 ;

logical_expression returns [boolean value]
                   : c=comparison_expression { $value = $c.value; }
                   | i=in_expression { $value = $i.value; }
                   | l=rlike_expression { $value = $l.value; }
                   ;

chained_expression returns [boolean value]
                   : e=logical_expression { $value = $e.value; } (OR  c=chained_expression { $value |= $c.value; })?
                   | e=logical_expression { $value = $e.value; } (AND c=chained_expression { $value &= $c.value; })?
                   ;

grouped_expression returns [boolean value]
                   : LCPAREN c=chained_expression { $value = $c.value; } RCPAREN ;

expression returns [boolean value]
           : c=chained_expression { $value = $c.value; } (OR  e=expression { $value |= $e.value; })?
           | c=chained_expression { $value = $c.value; } (AND e=expression { $value &= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (OR  e=expression { $value |= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (AND e=expression { $value &= $e.value; })?
           ;

criteria returns [boolean value]
         : e=expression { $value = $e.value; }
         ;


/* Lexical rules */

AND : 'and' ;
OR  : 'or' ;

TRUE  : 'true' ;
FALSE : 'false' ;

EQ    : '=' ;
NEQ   : '<>' ;
GT    : '>' ;
GE    : '>=' ;
LT    : '<' ;
LE    : '<=' ;
IN    : 'in' ;
RLIKE : 'rlike' ;

LCPAREN : '(' ;
RCPAREN : ')' ;
LBPAREN : '[' ;
RBPAREN : ']' ;

DECIMAL : '-'?[0-9]+('.'[0-9]+)? ;

IDENTIFIER : [a-zA-Z_][a-zA-Z_.0-9]* ;

QUOTED_LITERAL :
                 (  '\''
                    ( ('\\' '\\') | ('\'' '\'') | ('\\' '\'') | ~('\'') )*
                 '\''  )
                ;

WS : [ \r\t\u000C\n]+ -> skip ;



public class CriteriaEvaluator extends CriteriaBaseListener
{

    static class CriteriaEvaluatorErrorListener extends BaseErrorListener
    {

        Optional<String> error = Optional.empty();

        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
            error = Optional.of(String.format("Failed to parse at line %d:%d due to %s", line, charPositionInLine + 1, msg));
        }

    }

    public static boolean evaluate(String input, Object argument)
    {
        CriteriaLexer lexer = new CriteriaLexer(new ANTLRInputStream(input));
        CriteriaParser parser = new CriteriaParser(new CommonTokenStream(lexer), argument);
        parser.removeErrorListeners();
        CriteriaEvaluatorErrorListener errorListener = new CriteriaEvaluatorErrorListener();
        lexer.removeErrorListeners();
        lexer.addErrorListener(errorListener);
        parser.removeErrorListeners();
        parser.addErrorListener(errorListener);
        CriteriaParser.CriteriaContext criteriaCtx = parser.criteria();
        if(errorListener.error.isPresent())
        {
            throw new IllegalArgumentException(errorListener.error.get());
        }
        else
        {
            return criteriaCtx.value;
        }
    }

}
sample
  • 29
  • 2
  • 3
    Welcome to Stack Overflow! While this code snippet may solve the question, [including an explanation](http://meta.stackexchange.com/questions/114762/explaining-entirely-‌​code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – NathanOliver Jan 22 '16 at 17:50
-1

here is a detail example, (borrows from https://github.com/satnam-sandhu/ASTGenerator), i do some change for getting line number.

helloworld.java

public class HelloWorld {
public static void main(String[] args) {
    System.out.println("Hello, World");
    }
}

JavaAstGeneratorDOT.java

import antlr.Java8Lexer;
import antlr.Java8Parser;

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.misc.Interval;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.ArrayList;


public class JavaAstGeneratorDOT {

    static ArrayList<String> LineNum = new ArrayList<String>();
    static ArrayList<String> Type = new ArrayList<String>();
    static ArrayList<String> Content = new ArrayList<String>();
    static ArrayList<String> RawLineNum = new ArrayList<String>();

    private static String readFile(String pathname) throws IOException {
        File file = new File(pathname);
        byte[] encoded = Files.readAllBytes(file.toPath());
        return new String(encoded, Charset.forName("UTF-8"));
    }

    public static void main(String args[]) throws IOException {
        String path = "helloworld.java";
        String inputString = readFile(path);
        ANTLRInputStream input = new ANTLRInputStream(inputString);
        Java8Lexer lexer = new Java8Lexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParserRuleContext ctx = parser.compilationUnit();
//      ParserRuleContext ctx = parser.statementExpressionList();
//      ParserRuleContext ctx = parser.methodDeclaration();

        generateAST(ctx, false, 0, tokens);
        String filename = path.substring(path.lastIndexOf("\\") + 1, path.lastIndexOf("."));
        String save_dot_filename = String.format("ast_%s.dot", filename);
        PrintWriter writer = new PrintWriter(save_dot_filename);
        writer.println(String.format("digraph %s {", filename));
        printDOT(writer);
        writer.println("}");
        writer.close();
    }

    private static void generateAST(RuleContext ctx, boolean verbose, int indentation, CommonTokenStream tokens) {
        boolean toBeIgnored = !verbose && ctx.getChildCount() == 1 && ctx.getChild(0) instanceof ParserRuleContext;
        if (!toBeIgnored) {
            String ruleName = Java8Parser.ruleNames[ctx.getRuleIndex()];
            LineNum.add(Integer.toString(indentation));
            Type.add(ruleName);
            Content.add(ctx.getText());

            // get line number, added by tsmc.sumihui, 20190425
            Interval sourceInterval = ctx.getSourceInterval();
            Token firstToken = tokens.get(sourceInterval.a);
            int lineNum = firstToken.getLine();
            RawLineNum.add(Integer.toString(lineNum));
        }
        for (int i = 0; i < ctx.getChildCount(); i++) {
            ParseTree element = ctx.getChild(i);
            if (element instanceof RuleContext) {
                generateAST((RuleContext) element, verbose, indentation + (toBeIgnored ? 0 : 1), tokens);
            }
        }
    }

    private static void printDOT(PrintWriter writer) {
        printLabel(writer);
        int pos = 0;
        for (int i = 1; i < LineNum.size(); i++) {
            pos = getPos(Integer.parseInt(LineNum.get(i)) - 1, i);
            writer.println((Integer.parseInt(LineNum.get(i)) - 1) + Integer.toString(pos) + "->" + LineNum.get(i) + i);
        }
    }

    private static void printLabel(PrintWriter writer) {
        for (int i = 0; i < LineNum.size(); i++) {
//          writer.println(LineNum.get(i)+i+"[label=\""+Type.get(i)+"\\n "+Content.get(i)+" \"]");
            writer.println(LineNum.get(i) + i + "[label=\"" + Type.get(i) + "\", linenum=\"" + RawLineNum.get(i) + "\"]");
        }
    }

    private static int getPos(int n, int limit) {
        int pos = 0;
        for (int i = 0; i < limit; i++) {
            if (Integer.parseInt(LineNum.get(i)) == n) {
                pos = i;
            }
        }
        return pos;
    }
}

results is like this (ast_helloworld.dot):

digraph helloworld {
00[label="compilationUnit", linenum="1"]
11[label="normalClassDeclaration", linenum="1"]
22[label="classModifier", linenum="1"]
23[label="classBody", linenum="1"]
34[label="methodDeclaration", linenum="2"]
45[label="methodModifier", linenum="2"]
46[label="methodModifier", linenum="2"]
47[label="methodHeader", linenum="2"]
58[label="result", linenum="2"]
59[label="methodDeclarator", linenum="2"]
610[label="formalParameter", linenum="2"]
711[label="unannArrayType", linenum="2"]
812[label="unannClassType_lfno_unannClassOrInterfaceType", linenum="2"]
813[label="dims", linenum="2"]
714[label="variableDeclaratorId", linenum="2"]
415[label="block", linenum="2"]
516[label="expressionStatement", linenum="3"]
617[label="methodInvocation", linenum="3"]
718[label="typeName", linenum="3"]
819[label="packageOrTypeName", linenum="3"]
720[label="literal", linenum="3"]
00->11
11->22
11->23
23->34
34->45
34->46
34->47
47->58
47->59
59->610
610->711
711->812
711->813
610->714
34->415
415->516
516->617
617->718
718->819
617->720
}