0

Just as the reluctant quantifiers work in Regular expressions I'm trying to parse two different tokens from my input i.e, for operand1 and operator. And my operator token should be reluctantly matched instead of greedily matching input tokens for operand1.

Example, Input:

Active Indicator in ("A", "D", "S")

(To simplify I have removed the code relevant for operand2)

Expected operand1:

Active Indicator

Expected operator:

in

Actual output for operand1:

Active indicator in

and none for the operator rule. Below is my grammar code:

grammar Test;

condition: leftOperand WHITESPACE* operator;

leftOperand:  ALPHA_NUMERIC_WS ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;

EQUALS  : '=';
NOT_EQUALS  : '!=';
IN  : 'in';
NOT_IN  : 'not' WHITESPACE 'in';

WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC_WS:    WORD  ( WORD| DIGIT | WHITESPACE )* ( WORD | DIGIT)+ ;
WHITESPACE  : (' ' | '\t')+;

fragment DIGIT: '0'..'9' ;

LOWERCASE   : [a-z] ;
UPPERCASE   : [A-Z] ;
ImGroot
  • 796
  • 1
  • 6
  • 17

1 Answers1

1

One solution to this would be to not produce one token for several words but one token per word instead.
Your grammar would then look like this:

grammar Test;

condition: leftOperand operator;

leftOperand:  ALPHA_NUMERIC+ ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;

EQUALS  : '=';
NOT_EQUALS  : '!=';
IN  : 'in';
NOT_IN  : 'not' WHITESPACE 'in';

WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC:    WORD  ( WORD| DIGIT)* ;
WHITESPACE  : (' ' | '\t')+ -> skip; // ignoring WS completely

fragment DIGIT: '0'..'9' ;

LOWERCASE   : [a-z] ;
UPPERCASE   : [A-Z] ;

Like this the lexer will not match the whole input as ALPHA_NUMERIC_WS once the corresponding lexer rule has been entered because any occuring WS forces the lexer to leave the ALPHA_NUMERIC rule. Therefore any following input will be given a chance to be matched by other lexer-rules (in the order they are defined in the grammar).

Raven
  • 2,951
  • 2
  • 26
  • 42
  • Thanks for the great suggestion.. but it's not exactly helping me. Since now if input contains just string literals no digits then parser fails to match leftoperand line 1:0 mismatched input 'Active' expecting ALPHA_NUMERIC (condition (leftOperand Active Indicator) (operator in)) To make it work I can redefine the ALPHA_NUMERIC_WS as a parser rule on top of your suggestion but then my interfacing program would need to know the alpha_numeric rule which otherwise should have been abstract – ImGroot Jan 22 '18 at 06:07
  • To add string support you simply need to extend the leftOperand parser rule to also match strings... I don't see why you'd need to make ALPHA_NUMERIC_WS a parser rule at all. – Raven Jan 22 '18 at 06:21
  • Please try with the same input.. "Active Indicator in".. only operator is matching.. parser failed to recognize leftOperand – ImGroot Jan 22 '18 at 07:54
  • Also, skipping whitespace is not an option for me. – ImGroot Jan 22 '18 at 12:37
  • Well then you have to actually invest some of your own time in order to solve the problem. I provided a worling solution to your stated problem. If it doesn't fit your need then you have to adapt it. If you have a specific question on that: Post a new question here on SO with the details. What might help you with your WS problem: https://stackoverflow.com/questions/45504124/ambiguous-call-expression-in-antlr4-grammar/45511037#45511037 – Raven Jan 22 '18 at 14:56