I was trying to simplifying the case presented in another question and got to the following parsing attempt using lark:
from lark.lark import Lark
text = """
start_thing {
loc int {
from 0,
to 1093,
strand plus,
id gi 384632836
}
}
"""
grammar = """\
thing: "start_thing" node
locus_info: "loc int" "{" int_info "," int_info "," STRAND_INFO "," int_info "}"
int_info: TAGS? INT
node: locus_info
| int_info
| TAGS? "{" nodes "}" -> subnodes
| TAGS -> onlytags
nodes: node?
| node ("," node)*
STRAND_INFO: "strand" SIGN
SIGN: "plus" | "minus"
TAGS: TAGWORD (WS TAGWORD)*
TAGWORD: ("_"|LETTER)("_"|"-"|LETTER|DIGIT)*
%import common.WS
%import common.LETTER
%import common.DIGIT
%import common.INT
%ignore WS
"""
parser = Lark(grammar, start="thing", ambiguity="explicit")
parsed = parser.parse(text)
print(parsed.pretty())
Output:
thing
subnodes
nodes
subnodes
loc int
nodes
node
int_info
from
0
node
int_info
to
1093
onlytags strand plus
node
int_info
id gi
384632836
As shown in this example, the ambiguity="explicit"
option should enable the displaying of alternative matching possibilities, preceded by an _ambig
label. This does not appear in the above output. It seems I don't get what an ambiguity is.
Why is "strand plus" not considered ambiguous? It seems to me that it could either be matched by STRAND_INFO
or onlytags
.
Similarly, I would expect "loc int {...}" to be matched by either locus_info
or subnodes
.