1

I'm using ANTLRv3 to parse input that looks like this:

* this is an outline item at level 1
** item at level 2
*** item at level 3
* another item at level 1
* an item with *bold* text

Stars at the beginning of a line mark the start of an outline item. Stars can also be part of an item's text (e.g. *bold*).

This is the grammar to parse outline items without support for stars in the item text:

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;
OUTLINE_ITEM_MARKER: STAR_IN_COLUMN_ZERO STAR* (' '|'\t');
ITEM_TEXT: ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t')+;
fragment STAR_IN_COLUMN_ZERO: {getCharPositionInLine()==0}? '*';
fragment STAR: {getCharPositionInLine()>0}? '*';

For the input *** foo bar ANTLR produces the following parse tree:

without_star_in_item_text

So far this works as expected. Now I'm trying to add star to the possible characters for the item text, so I changed the lexer rule for ITEM_TEXT to the following:

ITEM_TEXT: ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t'|STAR)+;

Now for the same input the following parse tree is produced:

with_star_in_item_text

This is the output in ANTLRWorks:

input.txt line 1:0 rule STAR failed predicate: {getCharPositionInLine()>0}?
input.txt line 1:1 missing OUTLINE_ITEM_MARKER at '** foo bar'

It seems that OUTLINE_ITEM_MARKER didn't match due to a MissingTokenException. What's wrong with the grammar, what do I need to change to allow stars to be part of ITEM_TEXT?

paprika
  • 2,424
  • 26
  • 46

2 Answers2

2

Instead of a validating semantic predicate, use a gated semantic predicate 1 in your fragments.

The following grammar:

grammar Test;

outline_items
 : outline_item+ EOF
 ;

outline_item
 : OUTLINE_ITEM_MARKER ITEM_TEXT
 ;

OUTLINE_ITEM_MARKER 
 : STAR_IN_COLUMN_ZERO STAR* (' '|'\t')
 ;

ITEM_TEXT
 : ('a'..'z'|'A'..'Z'|'0'..'9'|'\r'|'\n'|' '|'\t'|STAR)+
 ;

fragment STAR_IN_COLUMN_ZERO
 : {getCharPositionInLine()==0}?=> '*'
 ;

fragment STAR
 : {getCharPositionInLine()>0}?=> '*'
 ;

Your input:

* this is an outline item at level 1
** item at level 2
*** item at level 3
* another item at level 1
* an item with *bold* text

will then be parsed as this:

enter image description here

1 What is a 'semantic predicate' in ANTLR?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

Have you tried making your grammar simpler?

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;

ITEM_TEXT:
    (' '|'\t') (' '|'\t'|'a'..'z'|'A'..'Z'|'0'..'9'| STAR)+
;

OUTLINE_ITEM_MARKER:
    STAR+ 
;

fragment STAR:   
    '*'
;

Or if you don't need to keep STAR as an explicit fragment, and you want to capture all characters in the item text, and not a subset:

outline_item: OUTLINE_ITEM_MARKER ITEM_TEXT;

ITEM_TEXT:
    (' '|'\t') (~('\n'|'\r'))+
;

OUTLINE_ITEM_MARKER:
    '*'+ 
;
ironchefpython
  • 3,409
  • 1
  • 19
  • 32
  • Indeed, that simplifies the grammar quite a bit... However, your grammar does not make a distinction between a `*` at the start of a line, and one elsewhere: something the OP is trying to do. – Bart Kiers Feb 11 '12 at 19:00
  • @BartKiers Please read the provided grammar (or better yet test it in ANTLRWorks) before making that assumption. – ironchefpython Feb 11 '12 at 20:00
  • Note that I did not say your suggestion didn't work. Sure, it works with just a few rules, but I highly doubt that the OP is doing only that: this could be done without the help of a full-blown recursive descent parser. The OP's question is how to make a distinction between two the same characters (* in this case) when placed at a specific location in the input. This is something you do not address in the lexer rules. – Bart Kiers Feb 11 '12 at 20:22
  • Bart Kiers' assumption is correct, I'd like to make a distinction based on the location of the character. I tried to isolate the problem to the minimum, so I left out the fact that I'd also like to parse '*bold*' later on with the same grammar. – paprika Feb 11 '12 at 20:40