0

I'm using antlr4 and I'm trying to make a parser for Matlab. One of the main issue there is the fact that comments and transpose both use single quotes. What I was thinking of a solution was to define the STRING lexer rule in somewhat the following manner:

(if previous token is not ')','}',']' or [a-zA-Z0-9]) than match '\'' ( ESC_SEQ | ~('\\'|'\''|'\r'|'\n') )* '\'' (but note I do not want to consume the previous token if it is true).

Does anyone knows a workaround this problem, as it does not support negative lookaheads?

Alex Botev
  • 1,369
  • 2
  • 19
  • 34

1 Answers1

2

You can do negative lookahead in ANTLR4 using _input.LA(-1) (in Java, see how to resolve simple ambiguity or ANTLR4 negative lookahead in lexer).

You can also use lexer mode to deal with this kind of stuff, but your lexer had to be defined in its own file. The idea is to go from a state that can match some tokens to another that can match new ones.

Here is an example from ANTLR4 lexer documentation:

// Default "mode": Everything OUTSIDE of a tag
COMMENT : '<!--' .*? '-->' ;
CDATA : '<![CDATA[' .*? ']]>' ;
OPEN : '<'                     -> pushMode(INSIDE) ;
 ...
XMLDeclOpen : '<?xml' S        -> pushMode(INSIDE) ;
...

// ----------------- Everything INSIDE of a tag ------------------    ---
mode INSIDE;
CLOSE : '>'         -> popMode ;
SPECIAL_CLOSE: '?>' -> popMode ; // close <?xml...?>
SLASH_CLOSE : '/>'  -> popMode ;
Community
  • 1
  • 1
Vincent Aranega
  • 1,441
  • 10
  • 21
  • Do you know anyway to specify a range in the semantic predicate, e.g. {_input.LA(-1) != [a-zA-Z0-9]} ? – Alex Botev Mar 06 '15 at 14:33
  • This code is directly put in the generated lexer (the code is directly put in java/your_target_language into the generated lexer), so I don't think you can't use directly range. However, as it is Java (or your target language) which is copied in your lexer, you can use `matches(...)` or equivalent functions. However, in your case, the use of lexer modes should be more appropriate. – Vincent Aranega Mar 06 '15 at 14:50