3

I'm working on a Lark-based project where I need to be able to "catch" comments in the code being parsed.

However it doesn't work when using the standard lexer without explicitly specifying the standard lexer.

I have taken the second example from the Lark recipes and modified it to use the default parser and to parse C++-like one-line comments:

import lark

comments = []

grammar = r'''
start: INT*

COMMENT: "//" /[^\n]*/

%import common (INT, WS)
%ignore COMMENT
%ignore WS
'''

# This doesn't work, comments are not appended to the list
# parser = lark.Lark(grammar, lexer_callbacks={'COMMENT': comments.append})

# But this does work
parser = lark.Lark(grammar, lexer='standard', lexer_callbacks={'COMMENT': comments.append})

source = r'''
1 2 3  // hello
// world
4 5 6
'''

parser.parse(source)

print(comments)

If I don't have lexer='standard' the result is an empty list.

But shouldn't it already be using the 'standard' lexer when one isn't explicitly specified? Is it a mistake in my code, or a possible bug in Lark?


Further experimentation seems to indicate that it's either the 'dynamic' or 'dynamic_complete' being used in the default case (lexer not specified).

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • From [the code](https://github.com/lark-parser/lark/blob/87a18a098e306dbe0f4258732ad8944832dc4a39/lark/lark.py#L306), it seems that with the default `auto` value for the lexer and without specifying a parser or `postlex`, you should indeed get the standard lexer... it may be worth to step into the `Lark()` call and see what goes on in there exactly – GPhilo Jul 19 '21 at 08:52

1 Answers1

2

Lark supports different combinations of parser and lexer. Some support lexer_callbacks, some don't:

parser lexer lexer_callbacks
lalr standard Yes
lalr contextual Yes
earley standard Yes
earley dynamic No
earley dynamic_complete No
lalr custom (Maybe)
earley custom (Maybe)

lexer="auto" selects a lexer depending on the parser: For lalr it selects contextual, for earley it selects dynamic. The default parser is earley, so without selecting parser or lexer, lexer_callbacks are not supported.

A issue in this regard was already opened and closed again.

MegaIng
  • 7,361
  • 1
  • 22
  • 35