6

I'm trying to create a Beginning-Of-Line token:

lexer grammar ScriptLexer;

BOL : {getCharPositionInLine() == 0;}; // Beginning Of Line token

But the above emits the error

The name 'getCharPositionInLine' does not exist in the current context

As it creates this code:

private void BOL_action(RuleContext _localctx, int actionIndex) {
    switch (actionIndex) {
    case 0: getCharPositionInLine() == 0; break;
    }
}

Where the getCharPositionInLine() method doesn't exist...

Tar
  • 8,529
  • 9
  • 56
  • 127
  • Maybe try `GetCharPositionInLine()` (PascalCase as recommended by various C# code guidelines) – knittl Aug 09 '15 at 12:09
  • @knittl, tried that. No method with a name that is even similar to that... – Tar Aug 09 '15 at 12:22
  • Have a look at the lexer class: https://github.com/antlr/antlr4-csharp/blob/master/runtime/CSharp/Antlr4.Runtime/Lexer.cs There is a `charPositionInLine` in there, but I'm not really familiar with C# to post an answer (hence this comment). – Bart Kiers Aug 09 '15 at 13:53
  • 5
    @knittl C# has properties in the language, so you won't see many getter functions in C# code :-) The solution here is to use the `Column` property, so `fragment BOL : { Column == 0 } ;` (or `== 1`, dunno) should probably work (I don't think it makes sense to have an empty lexer rule, hence the `fragment`). – Lucas Trzesniewski Aug 09 '15 at 19:49
  • @LucasTrzesniewski - that was it. Please post an answer so I can accept it – Tar Aug 10 '15 at 12:41
  • If anybody is looking for Typescript property it's `this.charPositionInLine === 0;` where `this` refers to Lexer superclass. – K.Novichikhin Oct 21 '20 at 17:18

1 Answers1

7

Simplest approach is to just recognize an EOL as the corresponding BOL token.

BC  : '/*' .*? '*/' -> channel(HIDDEN) ;
LC  : '//' ~[\r\n]* -> channel(HIDDEN) ;
HWS : [ \t]*        -> channel(HIDDEN) ;
BOL : [\r\n\f]+ ;

Rules like a block comment rule will consume the EOLs internally, so no problem there. Rules like a line comment will not consume the EOL, so a proper BOL will be emitted for the line immediately following.

A potential problem is that no BOL will be emitted for the beginning of input. Simplest way to handle this is to force prefix the input text with a line terminal before feeding it to the lexer.

GRosenberg
  • 5,843
  • 2
  • 19
  • 23
  • Excellent answer, it helped me with a similar question (I got here via https://stackoverflow.com/q/32870858/1112244). I will add that if you don't route `BOL` to a hidden channel, you will have to include it in your parser everywhere you expect to encounter those characters. In my case, I use a separate lexer and parser, and I defined in my lexer the token that had to appear at the beginning of the line (it is a line label). My parser rules are not EOL-delimited otherwise, so I routed `BOL` to a hidden channel in order to avoid adding it as a parser rule. – Peter Nov 07 '17 at 04:06