1

I'm trying to build a simple grammar to parse a .Net type name string, supporting generics. I admit to being completely new to building grammars in any language. A type string might look like the following.

Foo.Bar.Blah(Mom.Dad, Son.Daughter(Frank.Bob), Dog)

Basically, it's recursive. Ya'll should understand this.

I'm completely out in the woods with this one. Not sure how to begin. What I've built currently, which doesn't actually work, is this:

tree grammar XmlTypeName;

options {
    language=CSharp2;
}

RPAREN
  : '('
  ;

LPAREN
  : ')'
  ;

SEP
  : ','
  ;

TYPE  
  : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+
  ;

prog
  : type;

type
  : TYPE (RPAREN type (SEP type)? LPAREN)? (EOF)?
  ;

This doesn't even get close to working. Antlr3.exe throws errors saying that RPARAM and LPARAM aren't allowed in a tree parser. Is a tree parser even what I need?

I'd like to produce a simple AST that lets me navigate down the types.

Jerome Haltom
  • 1,670
  • 2
  • 17
  • 23

1 Answers1

2

No, you shouldn't use a tree grammar. A tree grammar is used after a parser has created an AST. Simply remove the keyword tree from it.

A couple of other remarks:

  • you want to match one or more comma separated types inside parenthesis, but you used type (SEP type)?, which matches one or two types. You'll need type (SEP type)* instead;
  • you didn't account for the . inside the types;
  • you should discard literal spaces in the lexer.

Something like this will do the trick, most probably:

grammar XmlTypeName;

options {
  language=CSharp2;
}

prog
 : type EOF
 ;

type
 : name (RPAREN type (SEP type)* LPAREN)? 
 ;

name
 : ID (DOT ID)*
 ;

RPAREN
 : '('
 ;

LPAREN
 : ')'
 ;

SEP
 : ','
 ;

DOT
 : '.'
 ;

ID  
 : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+
 ;

SPACE
 : (' '|'\t')+ {Skip();} // if 'Skip()' doesn't work, try 'skip()'
 ;

However, the above just creates a flat list of tokens. If you want to create a proper AST, you need to "tell" ANTLR which nodes/tokens are root tokens, and which ones to discard (like the comma's, parenthesis, ...).

grammar XmlTypeName;

options {
  output=AST;
  language=CSharp2;
}

tokens {
  TYPE;
  NAME;
}

prog
 : type EOF -> type
 ;

type
 : name (RPAREN type (SEP type)* LPAREN)? -> ^(TYPE name type*)
 ;

name
 : ID (DOT ID)* -> ^(NAME ID+)
 ;

RPAREN
 : '('
 ;

LPAREN
 : ')'
 ;

SEP
 : ','
 ;

DOT
 : '.'
 ;

ID  
 : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+
 ;

SPACE
 : (' '|'\t')+ {skip();}
 ;

which creates the following AST:

enter image description here

More info about creating AST's with ANTLR: How to output the AST built using ANTLR?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks! After about 2 hours on this, I managed to figure out that I needed to split out ID, and the 'name' parsing much in the same way as the type list ',' parsing. But that just produced a flat tree. I'm still a little unsure what the syntax here actually means. -> is a rewrite, so it alters the tokens out of the lexer... got that... but what's the meaning of ^()? – Jerome Haltom Mar 08 '12 at 20:07
  • No problem. The AST operators are explained in the link I posted at the end of my answer. – Bart Kiers Mar 08 '12 at 20:09
  • Also, It's actually emitting NAME '(' TYPE ',' TYPE ',' TYPE ')' in the AST. How do I have it hide the ('s from the AST? Or should I even bother? – Jerome Haltom Mar 08 '12 at 20:09
  • The way I showed it, the `'('`, `')'` and `','` *are* omitted from the AST: try it. Also read the link I posted at the end: it will explain how to omit or include tokens in the AST. – Bart Kiers Mar 08 '12 at 20:10
  • @wasabi, if you're using my second example and you're seeing the parenthesis and comma's, you're probably using ANTLRWorks' interpreter: don't, it a bit buggy. Use ANTLRWorks' debugger instead (which shows the actual AST that is created, not only the parse tree). – Bart Kiers Mar 08 '12 at 20:22