2

I am trying to create a TatSu parser for a language containing C-like expressions. I have the following grammar rules for the expressions:

identifier =
    /[a-zA-Z][A-Za-z0-9_]*/
    ;

expression =
    or_expr
    ;

or_expr =
    '||'<{and_expr}+
    ;

and_expr =
    '&&'<{bitwise_or_expr}+
    ;

bitwise_or_expr =
    '|'<{bitwise_xor_expr}+
    ;

bitwise_xor_expr =
    '^'<{bitwise_and_expr}+
    ;

bitwise_and_expr =
    '&'<{equality_expr}+
    ;

equality_expr =
    ('==' | '!=')<{comparison_expr}+
    ;

comparison_expr =
    ('<' | '<=' | '>' | '>=')<{bitshift_expr}+
    ;

bitshift_expr =
    ('<<' | '>>')<{additive_expr}+
    ;

additive_expr =
    ('+' | '-')<{multiplicative_expr}+
    ;

multiplicative_expr =
    ('*' | '/' | '%')<{unary_expr}+
    ;

unary_expr =
    '+' ~ atom
    | '-' ~ atom
    | '~' ~ atom
    | '!' ~ atom
    | atom
    ;

atom =
    literal
    | helper_call
    | parenthesized
    | var_or_param
    ;

literal =
    value:float type:`float`
    | value:integer type:`int`
    | value:char type:`char`
    | value:string type:`string`
    | value:bool type:`int`
    | value:null type:`null`
    ;

helper_call =
    function:identifier '(' ~ params:expression_list ')'
    ;

var_or_param =
    identifier
    ;

parenthesized =
    '(' ~ @:expression ')'
    ;

I was running into trouble with the atom rule. When parsing the following (the expression being the part between the = and ;):

lastTime = ts + interval;

I got this exception:

tatsu.exceptions.FailedToken: (27:41) expecting '(' :
                lastTime = ts + interval;
                                        ^
helper_call
atom
unary_expr
multiplicative_expr
...

It was failing trying to make it fit the helper_call rule, when the var_or_param rule should have matched just fine. It turns out, the cause was an erroneous FailedSemantics raised by the semantic actions for var_or_param. Once I fixed that, the parsing worked as expected.

This raises a question: If FailedSemantics affects the parsing logic, what is the proper way to alert the user when there is a semantic error, but the parse logic is otherwise correct and should not attempt different choices or rules? For example, type mismatches or variable usage before declaration? (Ideally in a way that would still show the line number where the error occurred.)

Dominick Pastore
  • 4,177
  • 2
  • 17
  • 29
  • 1
    Have you tried enabling tracing, to see the progress of the parsing? – Apalala Apr 18 '20 at 17:58
  • `FailedSemantics` does affect the parsing. It gets translated to a `FailedParse` in the parse logic. – Apalala Apr 18 '20 at 17:59
  • @Apalala No, I will try that. For the second comment, that explains why I was having trouble, then. But that raises the real question: What is the proper way to alert the user when there is a semantic error, but the parse logic is otherwise correct and should not attempt different choices or rules? For example, if an undeclared variable is used, there is a type mismatch, etc.? Ideally in a way that would still show the line number the error occurred on? (Perhaps this should be a new question.) – Dominick Pastore Apr 19 '20 at 18:46
  • Please expand your question to cover semantic errors, and I'll write a proper reply. – Apalala Apr 21 '20 at 12:50
  • @Apalala Thanks. Done. – Dominick Pastore Apr 21 '20 at 15:28

1 Answers1

2

FailedSemantics does affect the parsing. It gets translated to a FailedParse in the parse logic.

If the parsing should stop, then keep using FailedSemantics.

In other scenarios it's up to you.

TatSu is designed so most of semantic checks are done after the parse succeeded, through a walker or other means.

Apalala
  • 9,017
  • 3
  • 30
  • 48
  • That explains why I was originally having trouble, but the core question remains: What is the proper way to signal that there is a semantic issue, but the parsing is otherwise correct? For example, type mismatches, undeclared variables, etc. If we just raise some other exception, Tatsu doesn't report any useful information like the line number. – Dominick Pastore Mar 25 '21 at 21:11
  • 1
    If the parsing should stop, then keep using `FailedSemantics`. In other scenarios it's up to you. TatSu is designed so most of semantic checks are done _after_ the parse succeed, through a walker or other means. – Apalala Mar 26 '21 at 12:17
  • Ok. It sounds like I was abusing the semantics class for more than was intended, then. I'll have to refactor that part of the code. (If you add that note to the answer, I can mark it as accepted.) – Dominick Pastore Mar 27 '21 at 13:55