1

I was trying to simplifying the case presented in another question and got to the following parsing attempt using lark:

from lark.lark import Lark

text = """
start_thing {
  loc int {
    from 0,
    to 1093,
    strand plus,
    id gi 384632836
  }
}
"""

grammar = """\
thing: "start_thing" node
locus_info: "loc int" "{" int_info "," int_info "," STRAND_INFO "," int_info "}"
int_info: TAGS? INT
node: locus_info
    | int_info
    | TAGS? "{" nodes "}" -> subnodes
    | TAGS                -> onlytags
nodes: node?
    | node ("," node)*
STRAND_INFO: "strand" SIGN
SIGN: "plus" | "minus"
TAGS: TAGWORD (WS TAGWORD)*
TAGWORD: ("_"|LETTER)("_"|"-"|LETTER|DIGIT)*
%import common.WS
%import common.LETTER
%import common.DIGIT
%import common.INT
%ignore WS
"""

parser = Lark(grammar, start="thing", ambiguity="explicit")
parsed = parser.parse(text)
print(parsed.pretty())

Output:

thing
  subnodes
    nodes
      subnodes
        loc int
        nodes
          node
            int_info
              from
              0
          node
            int_info
              to
              1093
          onlytags  strand plus
          node
            int_info
              id gi
              384632836

As shown in this example, the ambiguity="explicit" option should enable the displaying of alternative matching possibilities, preceded by an _ambig label. This does not appear in the above output. It seems I don't get what an ambiguity is.

Why is "strand plus" not considered ambiguous? It seems to me that it could either be matched by STRAND_INFO or onlytags.

Similarly, I would expect "loc int {...}" to be matched by either locus_info or subnodes.

bli
  • 7,549
  • 7
  • 48
  • 94
  • Does it make a difference if you change `"loc int"` to `"loc" "int"`? I wonder if the `%ignore WS` has some unwanted interaction. – rici Jun 22 '18 at 16:09
  • @rici There is no apparent difference if I try this. – bli Jun 22 '18 at 16:13
  • Ok, it was just an idea. I find it a bit odd that you can ignore WS and also use it in a production. I guess I don't understand what they mean by "ignore". – rici Jun 22 '18 at 16:27
  • @rici I don't really understand either. I guess it means that you can ignore them if you want, but you can still use them if you want. – bli Jun 26 '18 at 10:40
  • Why do you expect it to match `subnodes` to match `loc int`? `subnodes`, according to the grammar, have to *contain* `locus_info`, and in your example text, it doesn't. – Erez Jul 12 '18 at 06:00
  • @Erez I thought the grammar I wrote was telling that `subnodes` have to contain comma-separated `node`s (some of them may contain `locus_info`, which is a kind of `node`, but not necessarily). – bli Jul 25 '18 at 12:37

0 Answers0