1

I am using ANTLR to create an and/or parser+evaluator. Expressions will have the format like:

  • x eq 1 && y eq 10
  • (x lt 10 && x gt 1) OR x eq -1

I was reading this post on logic expressions in ANTLR Looking for advice on project. Parsing logical expression and I found the grammar posted there a good start:

grammar Logic;

parse
  :  expression EOF
  ;

expression
  :  implication
  ;

implication
  :  or ('->' or)*
  ;

or
  :  and ('&&' and)*
  ;

and
  :  not ('||' not)*
  ;

not
  :  '~' atom
  |  atom
  ;

atom
  :  ID
  |  '(' expression ')'
  ;

ID    : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};

However, while getting a tree from the parser works for expressions where the variables are just one character (ie, "(A || B) AND C", I am having a hard time adapting this to my case (in the example "x eq 1 && y eq 10" I'd expect one "AND" parent and two children, "x eq 1" and "y eq 10", see the test case below).

@Test
public void simpleAndEvaluation() throws RecognitionException{
    String src = "1 eq 1 && B";

    LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
    LogicParser parser = new LogicParser(new CommonTokenStream(lexer));


    CommonTree tree = (CommonTree)parser.parse().getTree();

    assertEquals("&&",tree.getText());
    assertEquals("1 eq 1",tree.getChild(0).getText());
    assertEquals("a neq a",tree.getChild(1).getText());
}

I believe this is related with the "ID". What would the correct syntax be?

Community
  • 1
  • 1
mmalmeida
  • 1,037
  • 9
  • 27

2 Answers2

2

For those interested, I made some improvements in my grammar file (see bellow)

Current limitations:

  • only works with &&/||, not AND/OR (not very problematic)

  • you can't have spaces between the parenthesis and the &&/|| (I solve that by replacing " (" with ")" and ") " with ")" in the source String before feeding the lexer)

    grammar Logic;

    options {
      output = AST;
    }
    
    tokens {
      AND = '&&';
      OR  = '||';
      NOT = '~';
    }
    
    // parser/production rules start with a lower case letter
    parse
      :  expression EOF!    // omit the EOF token
      ;
    
    expression
      :  or
      ;
    
    or
      :  and (OR^ and)*    // make `||` the root
      ;
    
    and
      :  not (AND^ not)*      // make `&&` the root
      ;
    
    not
      :  NOT^ atom    // make `~` the root
      |  atom
      ;
    
    atom
      :  ID
      |  '('! expression ')'!    // omit both `(` and `)`
      ;
    
    // lexer/terminal rules start with an upper case letter
    ID
      :
        (
        'a'..'z'
        | 'A'..'Z'
        | '0'..'9' | ' '
        | SYMBOL
      )+ 
      ;
    
    SYMBOL
      :
        ('+'|'-'|'*'|'/'|'_')
     ;
    
mmalmeida
  • 1,037
  • 9
  • 27
0
ID    : ('a'..'z' | 'A'..'Z')+;

states that an identifier is a sequence of one or more letters, but does not allow any digits. Try

ID    : ('a'..'z' | 'A'..'Z' | '0'..'9')+;

which will allow e.g. abc, 123, 12ab, and ab12. If you don't want the latter types, you'll have to restructure the rule a little bit (left as a challenge...)

In order to accept arbitrarily many identifiers, you could define atom as ID+ instead of ID.

Also, you will likely need to specify AND, OR, -> and ~ as tokens so that, as @Bart Kiers says, the first two won't get classified as ID, and so that the latter two will get recognized at all.

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
  • @Aasmund, that doesn't take into consideration 2 things I believe: the fact that a token can have spaces ("green neq green" should be one token). Also, using AND/OR instead of &&/||, shouldn't one need to say in the ID something like ~and && ~or (or ~"AND", ~"OR") ? – mmalmeida Mar 01 '12 at 09:39
  • @AasmundEldhuset: If you change the atom to ID+ but don't add "| ' '" to the ID, aren't you saying you expect IDID...ID, ie, 1eq1 (without the spaces)? That grammar tokenizes 1 eq 1 || B into a parent || and 4 children ( 1,eq,1,B), not 2 children (1 eq 1,B). – mmalmeida Mar 01 '12 at 14:30