0

I need to write a XText grammar for a language that supports the hyphen '-' in variable names.

I tried with the toy example below where I define ID to be the pattern for variable names. What I am trying to specify here is that ID starts with a letter eventually followed by zero or more characters that are letters or '-' and ends with a letter (not an hyphen).

grammar org.xtext.example.mydsl.MyDsl hidden(WS)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"

Model:
    lines+=Line*
;
    
Line:
    (Variable | Expression) '.'
;

Variable:
    name=ID
;

Expression:
    '=' Atomic ({LinkedExpression.leftOperand=current} op=('->') rightOperand=Atomic)*
;

Atomic returns Expression:
    ref=[Variable]
;

terminal ID: ('a'..'z'|'A'..'Z')(('a'..'z'|'A'..'Z'|'0'..'9'|'-')*('a'..'z'|'A'..'Z'|'0'..'9'))?;
terminal INT: ('0'..'9')+;
terminal WS: (' '|'\t'|'\r'|'\n')+;

The result is that if I try to parse the below text:


x.
y.
m-n.
=x->y.
=x -> y.
=m-n -> y.
=m-n->y.


The lines in bold fail as '-' in the '->' operator is read as part of a variable name; so for example:

=x->y.

is tokenised as:

=   x-   >   y   .

instead of:

=   x   ->   y   .

What am I doing wrong?

mzattera
  • 13
  • 5
  • What you wrote would work in Antlr4, but XText is stuck in Antlr3. To do this in Antlr3, you'd write a [gated semantic predicate](https://stackoverflow.com/questions/3056441/what-is-a-semantic-predicate-in-antlr/3056517#3056517), e.g., `ID: ('a'..'z'|'A'..'Z') ( ( { input.LA(1) == '-' && input.LA(2) != '>'}? => ('-' ('a'..'z'|'A'..'Z'|'0'..'9')) ) | ('a'..'z'|'A'..'Z'|'0'..'9') )* ;` which I tested. I don't know whether XText allows this syntax, but it's worth a try. – kaby76 Nov 29 '21 at 00:27
  • It seems, form a quick google lookup, that this could be done only by changing the generated .g file manually...or writing some kind of plugin to do that. I am afraid I am not an enough sophisticated user to do that. Maybe it is possible by using parser rules to define ID instead of lexer rules? – mzattera Nov 29 '21 at 08:14
  • Right. XText doesn't support gated semantic predicates. But, the rule can be written as a syntactic predicate, which XText does support. In Antlr3, the rule would be `ID: ('a'..'z'|'A'..'Z') ( ( ('-' ('a'..'z'|'A'..'Z'|'0'..'9')) => ('-' ('a'..'z'|'A'..'Z'|'0'..'9')) ) | ('a'..'z'|'A'..'Z'|'0'..'9') )* ;`, and that works. I assume the rule in XText would be `terminal ID: ('a'..'z'|'A'..'Z') ( ( => ('-' ('a'..'z'|'A'..'Z'|'0'..'9')) ) | ('a'..'z'|'A'..'Z'|'0'..'9') )* ;`, but I can't check it at the moment because I don't have Eclipse/XText set up--whacked machine with Win11 OS upgrade. – kaby76 Nov 29 '21 at 09:49
  • maybe you should use datatype rules for both the -> and the - in Identifiers – Christian Dietrich Nov 29 '21 at 16:03

1 Answers1

0

As suggested, I used a parser rule to define IDs. This make things a bit trickier as you now need semantic predicates to give the new Id rule precedence over operators such as -> and the newly introduced minus operator ('-'). In addition I use hidden() when defining Id to make sure NO spaces are allowed in variable names.

The resulting grammar is pasted below; if somebody has a better or shorter grammar that does the same, please let me know. This looks a bit overcomplicated to me, but works as expected.

grammar org.xtext.example.mydsl.MyDsl hidden(WS)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
 

generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"

Model:
    lines+=Line*
;

Id hidden():
    LETTERS+ (->'-' (LETTERS|NUMBERS)+)*
;   

Line:
    (Variable | Expression) '.'
;

Variable:
    name=Id
;

Expression:
    '=' AdditionExpression
;

AdditionExpression returns Expression:
    LinkedExpression ({AdditionExpression.leftOperand=current} op=('+'|'-') rightOperand=LinkedExpression)*
;

LinkedExpression returns Expression:
    Atomic ->({LinkedExpression.leftOperand=current} op=(ARROW_OPERATOR) rightOperand=LinkedExpression)*
;

Atomic returns Expression:
    ref=[Variable|Id]
;

terminal ARROW_OPERATOR: '->';
terminal LETTERS: ('a'..'z'|'A'..'Z');
terminal NUMBERS: ('0'..'9')+;
terminal WS: (' '|'\t'|'\r'|'\n')+;

Please notice now:

x.
y.
m-n.
 
=m-n-x->x-y-m-n.

correctly causes a parse error as it is read as:

= (m-n-x) -> (x-y-m-n)

and no variable named "m-n-x" or "x-y-m-n" exists. While:

=m-n -x->x - y- m-n.

is interpreted correctly as:

= ( (m-n) - x ) -> (x - (y - (m-n))).

this of course forces the user to use spaces around the '-' operator (which I see no way to avoid).

mzattera
  • 13
  • 5
  • did you look at datatype rules as proposed `ARROW_OPERATOR hidden(): '-''>';` `ID hidden():LETTERS+ (->'-' (LETTERS|NUMBERS)+)*; ` – Christian Dietrich Dec 02 '21 at 09:42
  • Hi @ChristianDietrich, your suggestion above generates a warning: `warning(200): ../org.xtext.example.mydsl/src-gen/org/xtext/example/mydsl/parser/antlr/internal/InternalMyDsl.g:424:3: Decision can match input such as "'-'" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input` Which means I still need the semantic predicate when I define LinkedExpression ...so it seems not to simplify the grammar. – mzattera Dec 04 '21 at 09:47
  • you but it may solve the space problem – Christian Dietrich Dec 04 '21 at 10:50