2

This Lark parser is based on this question and this website, but it fails when parsing a OR b OR c.

The website suggests:

<expression>::=<term>{<or><term>}
<term>::=<factor>{<and><factor>}
<factor>::=<constant>|<not><factor>|(<expression>)
<constant>::= false|true
<or>::='|'
<and>::='&'
<not>::='!'

Which appears consistent with the other question. My implementation and test cases...

import lark

PARSER = lark.Lark("""
    ?exp: term | term OR term
    ?term: factor | factor AND factor
    ?factor: symbol | NOT factor | "(" exp ")"
    symbol: /[a-z]+/
    AND: "AND"
    OR: "OR"
    NOT: "NOT"
    %ignore " "
""", start='exp')

qs = [
        'a',
        'NOT a',
        'a OR b',
        'a OR b OR c',
        'a AND b AND c',
        'NOT (a AND b AND c) OR NOT (b OR c)',
        'NOT a AND NOT b',
    ]

for q in qs:
    t = PARSER.parse(q)

Running it:

$ python ./foo.py 
Traceback (most recent call last):
  File "./foo.py", line 26, in <module>
    t = PARSER.parse(q)
  File "/tmp/v/lib64/python3.6/site-packages/lark/lark.py", line 311, in parse
    return self.parser.parse(text, start=start)
  File "/tmp/v/lib64/python3.6/site-packages/lark/parser_frontends.py", line 185, in parse
    return self._parse(text, start)
  File "/tmp/v/lib64/python3.6/site-packages/lark/parser_frontends.py", line 54, in _parse
    return self.parser.parse(input, start, *args)
  File "/tmp/v/lib64/python3.6/site-packages/lark/parsers/earley.py", line 292, in parse
    to_scan = self._parse(stream, columns, to_scan, start_symbol)
  File "/tmp/v/lib64/python3.6/site-packages/lark/parsers/xearley.py", line 137, in _parse
    to_scan = scan(i, to_scan)
  File "/tmp/v/lib64/python3.6/site-packages/lark/parsers/xearley.py", line 114, in scan
    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan}, set(to_scan))
lark.exceptions.UnexpectedCharacters: No terminal defined for 'O' at line 1 col 8

a OR b OR c
       ^

Expecting: {'AND'}

Where have I gone wrong? Is my conversion of term { OR term } to term | term OR term wrong?

rrauenza
  • 6,285
  • 4
  • 32
  • 57

2 Answers2

3

I'm not familiar with Lark specifically, but normally if there's no direct way to implement optional repeating, these kinds of grammars are implemented as

    ?exp: term | exp OR term
    ?term: factor | term AND factor

From what I can find in documentation, Lark does support this kind of construct directly, though:

    ?exp: term (OR term)*
    ?term: factor (AND factor)*

These do result in different syntax trees:

# first parser output
Tree(exp, [
    Tree(exp, [
        Tree(symbol, [Token(__ANON_0, 'a')]),
        Token(OR, 'OR'), 
        Tree(symbol, [Token(__ANON_0, 'b')])]),
    Token(OR, 'OR'),
    Tree(symbol, [Token(__ANON_0, 'c')])])
# second parser output
Tree(exp, [
    Tree(symbol, [Token(__ANON_0, 'a')]),
    Token(OR, 'OR'),
    Tree(symbol, [Token(__ANON_0, 'b')]),
    Token(OR, 'OR'),
    Tree(symbol, [Token(__ANON_0, 'c')])])
Random832
  • 37,415
  • 3
  • 44
  • 63
  • Aha! I just realized I assumed `{ }` is a single optional in EBNF. I had even tried `[ ]` which is supported by Lark. But `{ }` is actually `( )*` in Lark. – rrauenza Nov 26 '19 at 16:25
  • Also, I think your `?exp: term | exp OR term` is preferred for easier processing. With `( ... )*` I end up with variable args `(sym1, op, sym2, op, sym3, op, ...)` in the transformation function whereas with `term | exp OR term` I end up with a transform func of `(sym1, op, sym2)` Thank you! – rrauenza Nov 26 '19 at 18:24
0

Derivation using grammar given above exp-> term OR term, term won't be able to generate expression containing OR. Change the grammar rules to.

?exp: term | exp OR term
?term: factor | term AND factor

exp: exp OR term, because OR is left associative similarly AND.