1

I am trying to parse some code using sly. I would like to separate the statements with a semicolon.

I have defined a token called SEMI which represents a semicolon:

class MyLexer(Lexer):
    tokens = {
        ...,
        SEMI
    }

    SEMI = r";"
    ...

If I use SEMI inside the parser class like so:

class MyParser(Parser):
    ...
    @_("OUTPUT expr SEMI")
    def statement(self, p):
        return ("output", p.expr)

and put multiple statements in the code I'm trying to parse separated with a semicolon like so:

output 1;output 2;

I get the following error:

sly: Syntax error at line 1, token=OUTPUT

Does anyone know how to make sly parse multiple statements which are separated with a semicolon (or any other character, such as a newline)?

macic
  • 21
  • 1
  • 4
  • Welcome to Stack Overflow. The problem here isn't the separation, but the "multiple" aspect. The [documentation](https://sly.readthedocs.io/en/latest/sly.html) covers this and many more things, but it does seem to assume you have some understanding of parsing theory in general. – Karl Knechtel Jan 11 '23 at 23:51
  • This was good to see, though. Many years ago, I used `ply` (the predecessor), around the time that many people were starting to migrate to Python 3.x. I lamented that `ply` was written for 2.x, had some subtle ways of breaking even after making the obvious fixes for a 3.x environment, and wasn't taking advantage of nice new features like decorators. It's nice to see that, in 2016, the same author apparently put out a new tool addressing all of that. – Karl Knechtel Jan 11 '23 at 23:54

2 Answers2

2

If you just say that a statement has the form output <expr> ;, and you tell the parser to parse a statement, then it will parse a statement. Not "some number of statements". One statement. The second statement in the input doesn't match the grammar.

If you want to parse a program consisting of a number of statements, you have to do that explicitly:

@_("{ statement }")
def program(self, p):
    return p.statement

Note that the parser will attempt to parse the non-terminal produced by the first rule in the grammar, unless you configure a start symbol. Do make sure your grammar starts with the non-terminal you want to match.

Note:

The version of Sly currently on Github (which, according to Sly's author, is no longer being maintained or extended) includes partial implementation of EBNF optional and repeating elements, which I used in the above code. I apologise for using the wrong syntax in the first version of this answer.

rici
  • 234,347
  • 28
  • 237
  • 341
1

By default the parser only parses one statement. To parse multiple statements:

@_('statements')
def program(self, p):
    return p.statements

@_('statement')
def statements(self, p):
    return (p.statement, )

@_('statements statement')
def statements(self, p):
    return p.statements + (p.statement, )
macic
  • 21
  • 1
  • 4
  • By default, the parser parses the non-terminal specified in the first parsing function in the grammar. You can cause the parser to parse a different non-terminal by adding the class datamember `start` (eg. `start = 'program'`). Either way, the parser expects to find an end-of-input indication immediately after the expansion of the start symbol, which means that it just parses one thing. There's a good example of how Sly parses in the documentation, which notes that "A parse is only successful if the parser reaches a state where the symbol stack is empty **and there are no more input tokens**." – rici Jan 16 '23 at 16:53