ANTLR4: context-sensitive spaces?

Question

In a grammar I would like to implement texts without string delimiting xxx. The idea is to define things like

a = xxx;

instead of

a ="xxx";

to simplify typewriting. Otherwise there should be variable definitions and other kind of stuff as well.

As a first approach I experimented with this grammar:

    grammar SpaceNoSpace;

    prog: stat+;

    stat:
     'somethingelse' ';'
    | typed description* content
    ;

    typed:
     'something' '-'  
         | 'anotherthing' '-'
    ;

    description: 
             'someSortOfDetails'  COLON  ID HASH  
         | 'otherSortOfDetails' COLON  ID HASH 
    ;

    content:    
        contenttext ';'
    ;

    contenttext: 
         (~';')*
    ;

    COLON: ':' ;
    HASH: '#';
    SEMI: ';';
    SPACE: ' ';
    ID: [a-zA-Z][a-zA-z0-9]+;
    WS  :   [ \t\n\r]+ -> channel(HIDDEN);
    ANY_CHAR : . ;

This works fine for input files like this:

    something-someSortOfDetails: aVariableName#
    this is the content of this;

    anotherthing-someSortOfDetails: aVariableName#
    here spaces are accepted as much        as you like;

    somethingelse;

But modifying the last line to

    somethingelse ;

leads to a syntax error:

    line 7:15 extraneous input ' ' expecting ';'

This probably reveals that the lexer rule

  WS  :   [ \t\n\r]+ -> channel(HIDDEN);

is not applied, (but the SPACE rule???).

Otherwise, if I delete the SPACE lexer-rule, the space in "somethingelse ;" is ignored (by lexer-rule WS), so that the parser rule stat : somethingelse as a consequence is detected correctly. But as a consequence of the deleted SPACE-rule the content text will be reduced to single in-between-spaces, so "this here" will be reduced to "this here".

This is not a big problem, but nevertheless it is an interesting question:

is it possible to implement context-sensitive WS or SPACE lexer rules:

within the content parser-rule any space should be preserved, in any other rule spaces should be ignored.

Is this possible to define such a context-sensitive lexer-rule behavior in ANTLR4?

this: http://stackoverflow.com/questions/29060496/allow-whitespace-sections-antlr4 seems to be very close to an answer. Maybe this coulld be also be done within the grammar? Or even easier? — Mike75, Feb 01 '16 at 23:21
That looks like an answer to me, which would make this question a duplicate. — rici, Feb 02 '16 at 00:37

CoronA · Accepted Answer · 2016-02-02T05:14:36.970

1

Have you considered Lexer Modes? The section with mode(), pushMode(), popMode is probably interesting for you.

Yet I think that lexer modes are more a problem than a solution. Their purpose is to use (parser) context in the lexer. Consequently one should discard the paradigm of separating lexer and parser - and use a PEG-Parser instead.

edited Feb 02 '16 at 05:14

answered Feb 02 '16 at 05:03

CoronA

7,717
2
26
53

" Modes are not allowed within combined grammars", therefore I don't now how to use it here. The ANTLR4 book gives an example,where for XML the modes selection is triggered by '<' and '>', both are lexer rules. I tried to modfiy the content rule: content: ->mode(WITH_SPACE) contenttext ';' ->mode(WITHOUT_SPACE) ; but exactly this seems to be forbidden (no chance to place the switch into a parser rule???). – Mike75 Feb 02 '16 at 20:17
Is there maybe a chance apply the WS rule (kill every space in the first step) and let the content parser get the belonging spaces get back form the hidden channel (insert them into the content character stream after parsing)? I don't know how to get access to the hidden character, and especially not, if it is possible to access them rule-specific? – Mike75 Feb 02 '16 at 20:21
Most (all?) combined grammars can be transformed to separate lexer/parser grammars. Maybe this is an option for you. Your second suggestion is exactly what I did in the posting you already mentioned. – CoronA Feb 03 '16 at 06:03
your solution to your own question seems to be the even the most easy way to solve it since modes could not be applied to the parser rule. I thought there would be a way to use modes or channels without writing code (just with pure grammar statements). But as is seems this is not posible. So your code allows a really easy implementation of the rule within the grammar rules. Thanks! – Mike75 Feb 03 '16 at 07:53
Then we should mark this question as duplicate, right? – CoronA Feb 03 '16 at 10:41

score 0 · Answer 2 · answered Feb 01 '16 at 23:28

0

Since the SPACE rule is before the WS rule, the lexer is returning a space token to the parser. The ' ' is not being being placed on the hidden channel.

answered Feb 01 '16 at 23:28

GRosenberg

5,843
2
19
23

This does not answer the question, "is it possible to define such a context-sensitive lexer-rule behavior in ANTLR4?". – Mephy Feb 01 '16 at 23:50
Well, yes, of course. You basically have it except you stumbled over the `SPACE` rule. Might be better from a general design point of view to use lexical modes to isolate exactly where and how you want to handle whitespace sensitivity. – GRosenberg Feb 02 '16 at 06:32

ANTLR4: context-sensitive spaces?

2 Answers2