Antlr parser for and/or logic - how to get expressions between logic operators?

Question

I am using ANTLR to create an and/or parser+evaluator. Expressions will have the format like:

x eq 1 && y eq 10
(x lt 10 && x gt 1) OR x eq -1

I was reading this post on logic expressions in ANTLR Looking for advice on project. Parsing logical expression and I found the grammar posted there a good start:

grammar Logic;

parse
  :  expression EOF
  ;

expression
  :  implication
  ;

implication
  :  or ('->' or)*
  ;

or
  :  and ('&&' and)*
  ;

and
  :  not ('||' not)*
  ;

not
  :  '~' atom
  |  atom
  ;

atom
  :  ID
  |  '(' expression ')'
  ;

ID    : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};

However, while getting a tree from the parser works for expressions where the variables are just one character (ie, "(A || B) AND C", I am having a hard time adapting this to my case (in the example "x eq 1 && y eq 10" I'd expect one "AND" parent and two children, "x eq 1" and "y eq 10", see the test case below).

@Test
public void simpleAndEvaluation() throws RecognitionException{
    String src = "1 eq 1 && B";

    LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
    LogicParser parser = new LogicParser(new CommonTokenStream(lexer));


    CommonTree tree = (CommonTree)parser.parse().getTree();

    assertEquals("&&",tree.getText());
    assertEquals("1 eq 1",tree.getChild(0).getText());
    assertEquals("a neq a",tree.getChild(1).getText());
}

I believe this is related with the "ID". What would the correct syntax be?

Posted here as well: http://antlr.markmail.org/message/bk6kutcu56vuglgl — Bart Kiers, Mar 01 '12 at 18:52

mmalmeida · Accepted Answer · 2012-03-05T10:55:03.127

For those interested, I made some improvements in my grammar file (see bellow)

Current limitations:

only works with &&/||, not AND/OR (not very problematic)

you can't have spaces between the parenthesis and the &&/|| (I solve that by replacing " (" with ")" and ") " with ")" in the source String before feeding the lexer)

grammar Logic;

options {
  output = AST;
}

tokens {
  AND = '&&';
  OR  = '||';
  NOT = '~';
}

// parser/production rules start with a lower case letter
parse
  :  expression EOF!    // omit the EOF token
  ;

expression
  :  or
  ;

or
  :  and (OR^ and)*    // make `||` the root
  ;

and
  :  not (AND^ not)*      // make `&&` the root
  ;

not
  :  NOT^ atom    // make `~` the root
  |  atom
  ;

atom
  :  ID
  |  '('! expression ')'!    // omit both `(` and `)`
  ;

// lexer/terminal rules start with an upper case letter
ID
  :
    (
    'a'..'z'
    | 'A'..'Z'
    | '0'..'9' | ' '
    | SYMBOL
  )+ 
  ;

SYMBOL
  :
    ('+'|'-'|'*'|'/'|'_')
 ;

Aasmund Eldhuset · Answer 2 · 2012-03-01T11:57:30.840

0

ID    : ('a'..'z' | 'A'..'Z')+;

states that an identifier is a sequence of one or more letters, but does not allow any digits. Try

ID    : ('a'..'z' | 'A'..'Z' | '0'..'9')+;

which will allow e.g. abc, 123, 12ab, and ab12. If you don't want the latter types, you'll have to restructure the rule a little bit (left as a challenge...)

In order to accept arbitrarily many identifiers, you could define atom as ID+ instead of ID.

Also, you will likely need to specify AND, OR, -> and ~ as tokens so that, as @Bart Kiers says, the first two won't get classified as ID, and so that the latter two will get recognized at all.

edited Mar 01 '12 at 11:57

answered Mar 01 '12 at 00:56

Aasmund Eldhuset

37,289
4
68
81

@Aasmund, that doesn't take into consideration 2 things I believe: the fact that a token can have spaces ("green neq green" should be one token). Also, using AND/OR instead of &&/||, shouldn't one need to say in the ID something like ~and && ~or (or ~"AND", ~"OR") ? – mmalmeida Mar 01 '12 at 09:39
@AasmundEldhuset: If you change the atom to ID+ but don't add "| ' '" to the ID, aren't you saying you expect IDID...ID, ie, 1eq1 (without the spaces)? That grammar tokenizes 1 eq 1 || B into a parent || and 4 children ( 1,eq,1,B), not 2 children (1 eq 1,B). – mmalmeida Mar 01 '12 at 14:30

Antlr parser for and/or logic - how to get expressions between logic operators?

2 Answers2

Linked