2

I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:

grammar test;

program : idList;
idList : id* ;
id : ID ;

ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;

ANTLR Works will output:

ANTLR Works Interpreter Output

What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?

Chris Covert
  • 2,684
  • 3
  • 24
  • 31

1 Answers1

2

What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?

You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.

A quick demo:

grammar test;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
  IdList;
  Id;
}

@parser::members {

  private static void walk(CommonTree tree, int indent) {
    if(tree == null) return;
    for(int i = 0; i < indent; i++, System.out.print("    "));
    System.out.println(tree.getText());
    for(int i = 0; i < tree.getChildCount(); i++) {
      walk((CommonTree)tree.getChild(i), indent + 1);
    }
  }

  public static void main(String[] args) throws Exception {
    testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
    testParser parser = new testParser(new CommonTokenStream(lexer));
    walk((CommonTree)parser.program().getTree(), 0);
  }
}

program : idList EOF -> idList;
idList  : id*        -> ^(IdList id*);
id      : ID         -> ^(Id ID);

ID    : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};

fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT  : '0' .. '9';

If you run the demo above, you will see the following being printed to the console:

IdList
    Id
        abc
    Id
        abc123

As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:

idList  : id*        -> ^(IdList["idList"] id*);
id      : ID         -> ^(Id["id"] ID);

which will print:

idList
    id
        abc
    id
        abc123
Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288