How to "catch" terminals in Lark-based parser using the 'dynamic' lexers

Question

This is a follow-up question to my previous: Why do we need to specify the standard Lark lexer to be able to catch comment terminals?

I need to "catch" and save comments in the DSL parsed by the Lark-based parser. It seems to work well when using the 'standard' lexer, but then the grammar can't parse the rest of the DSL.

Instead the 'dynamic' or 'dynamic_complete' needs to be used, but then the comments can't seem to be "caught".

I have been using a variant of the second example from Larks own recipes for testing:

import lark

comments = []

grammar = r'''
start: INT*

COMMENT: "//" /[^\n]*/

%import common (INT, WS)
%ignore COMMENT
%ignore WS
'''

parser = lark.Lark(grammar, lexer='dynamic', lexer_callbacks={'COMMENT': comments.append})

source = r'''
1 2 3  // hello
// world
4 5 6
'''

parser.parse(source)

print(comments)

This program will print the comments as an empty list ([]), but will otherwise ignore them.

Are there other ways to "catch" and save terminals which otherwise needs to be ignored?

@MegaIng I'm not. The actual grammar I have doesn't work with `lalr` parser or `standard` lexer, at least not without major modifications. Are there other ways to intercept terminals during lexing? — Some programmer dude, Jul 19 '21 at 11:31
I mean the opposite, e.g. using `parser='lalr'`. There are problems with using `lexer_callbacks` from inside the dynamic lexer (e.g. we don't know which terminals will be used in the end, so you might get comments more than once). It is probably easier to redesign the grammar. — MegaIng, Jul 19 '21 at 11:46
I got a comment to a question in a related Lark issue on Github ([link to comment](https://github.com/lark-parser/lark/issues/386#issuecomment-882486006)) which says that an alternative is to update the Lark parser itself to collect ignored tokens. It made me think about a possible patch where collected ignored tokens are passed to a user-specified function (possibly filtered) similar to `lexer_callbacks`. — Some programmer dude, Jul 20 '21 at 10:29
Yes that would be possible. I already looked into the sourcecode (I am defacto co-maintainer), but it isn't that easy. Because of the way the EarleyForest is structured, one has to be careful where the ignored terminals are collected. — MegaIng, Jul 20 '21 at 11:15

How to "catch" terminals in Lark-based parser using the 'dynamic' lexers

0 Answers0