1

I trying to add some simple error handling to a DSL grammar in Tatsu. I wrote a simple grammar that parses input into either numbers or errors.

@@grammar :: Nums

start = wordlist $ ;

wordlist = {word eol}+ ;

eol = ':' ;

word
  = 
  | num:num     # Number.
  | err:err     # Not a number, error.
  | eol:eol     # Blank line.
  ;

num
  =
  | sci:sci      # Scientific 'e' notation
  | float:float  # Normal real number notation
  | int:int      # Integer
  ;


int = /[-+]?\d+\.?/ ;
float = /[-+]?\d*\.\d+/ ;
sci = /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/ ;

err = ->&eol ;

The input looks like:

123 : 2.3 : -1.: Error : -.0123 : -2.1e-2 : +1.2e+3 :

With tracing on I can see that it is correctly parsing all of the input. When it gets to the end, it seems to continue parsing in an infinite loop. From the last number to the start of the looping, the output looks like:

↙word↙wordlist↙start ~1:44
 +1.2e+3 :                                                                                                          
↙num↙word↙wordlist↙start ~1:45
+1.2e+3 :                                                                                                           
↙sci↙num↙word↙wordlist↙start ~1:45
+1.2e+3 :                                                                                                           
≡'+1.2e+3' /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/
 :                                                                                                                  
≡sci↙num↙word↙wordlist↙start ~1:52
 :                                                                                                                  
≡num↙word↙wordlist↙start ~1:52
 :                                                                                                                  
≡word↙wordlist↙start ~1:52
 :                                                                                                                  
↙eol↙wordlist↙start ~1:52
 :                                                                                                                  
≡':' 
≡eol↙wordlist↙start ~1:54
↙word↙wordlist↙start ~1:54
↙num↙word↙wordlist↙start 
↙sci↙num↙word↙wordlist↙start 
≢'' /([+-]?\d*\.?\d+[Ee][+-]?\d+)|(^[+-]?\d+\.?\d*[Ee][+-]?\d+)/
↙float↙num↙word↙wordlist↙start 
≢'' /[-+]?\d*\.\d+/
↙int↙num↙word↙wordlist↙start 
≢'' /[-+]?\d+\.?/
≢num↙word↙wordlist↙start 
↙err↙word↙wordlist↙start 
↙eol↙err↙word↙wordlist↙start 
≢':' 
≢eol↙err↙word↙wordlist↙start 
↙eol↙err↙word↙wordlist↙start 
≢eol↙err↙word↙wordlist↙start 
   :
   :
   :

For the life of me, I can't figure out why it doesn't stop. I'm not even sure what it is trying to parse. Can anyone help?

Thanks!

tjgriffin
  • 13
  • 2

1 Answers1

1

This is not a TatSu issue. The grammar given doesn't parse the desired language. You have eol all over the place.

You can try something like this:

wordlist = (eol).{word} $;

eol = ':' ;

word
  = 
  | num:num     # Number.
  | err:err     # Not a number, error.
  ;

err = ->&(eol|$) ;

A clearer version might be:

wordlist = ':'.{word} $;

word
  = 
  | num:num     # Number.
  | err:err     # Not a number, error.
  ;

err = ->&(':'|$) ;

By the way, very nice use of ->& for a recovery rule!

Apalala
  • 9,017
  • 3
  • 30
  • 48
  • Thanks! That worked. The separator apparently can't be another rule, so: wordlist = ':'.{word} $ ; Also, I used: err = {/[^:]+/}&(eol|$) ; in order to retain the 'skipped' input. I wanted blank lines to be legal, but I'll handle in in semantics. Still not clear: what is my grammar doing? Why didn't is stop? I thought the $ in the definition of start tells it to stop when at the end of input. When should one test for $? I tested removing $ from start, and it still worked. I won't leave it that way, My example may not be exhaustive, or the real grammar may need it. Cheers! – tjgriffin Jul 19 '21 at 15:15
  • `$` is just another input symbol, marking the end of input. It's good to use always use it or parsing may report success over part of the input. In other cases it can be used to stop recursion when the input is exhausted. – Apalala Jul 19 '21 at 20:33
  • Sorry about the error in the grammar I proposed. It's caused by operator precedence. Take a look at the new version using `()` around the separator rule. – Apalala Jul 19 '21 at 20:35
  • Ah, yes! Very nice. I'd have been fine with ':' , but (eol) is better. Thanks! – tjgriffin Jul 20 '21 at 23:13
  • Yet, you're right. It doesn't make sense to define a rule for a single literal. See the amendments to my answer. – Apalala Jul 22 '21 at 18:09
  • 1
    Both are fine. I like eol, because if I decide to change it, I only have to do it in one place. Plus it documents my intent. It makes the traces clearer. And if I'm searching the code for a bug, the eol will stand out more. – tjgriffin Jul 23 '21 at 21:24