Is this just a flawed grammar?

Question

I was looking through a grammar for focal and found someone had defined their numbers as follows:

 number
   : mantissa ('e' signed_)?
   ;

mantissa
   : signed_
   | (signed_ '.')
   | ('.' signed_)
   | (signed_ '.' signed_)
   ;

signed_
   : PLUSMIN? INTEGER
   ;

PLUSMIN
   : '+'
   | '-'
   ;

I was curious because I thought this would mean that, for example, 1.-1 would get identified as a number by the grammar rather than subtraction. Would a branch with unsigned_ be worth it to prevent this issue? I guess this is more of a question for the author, but are there any benefits to structuring it this way (besides the obvious avoiding floats vs ints)?

You are referring to [this grammar](https://github.com/antlr/grammars-v4/blob/master/focal/focal.g4), right? — rici, Sep 20 '22 at 16:01

Mike Cargal · Accepted Answer · 2022-09-20T20:59:33.450

0

It’s not necessarily flawed.

It does appear that it will recognize 1.-1 as a mantissa. However, that doesn’t mean that some post-parse validation doesn’t catch this problem.

It would be flawed if there’s an alternative, valid interpretation of 1.-1.

Sometimes, it’s just useful to recognize an invalid construct and produce a parse tree for “the only way to interpret this input”, and then you can detect it in a listener and give the user an error message that might be more meaningful than the default message that ANTLR would produce.

And, then again, it could also just be an oversight.

The `signed_` rule on the other hand, being:
signed_ : PLUSMIN? INTEGER;

Instead of

signed_ : PLUSMIN? INTEGER+;

does make this grammar somewhat suspect as a good example to work from.

edited Sep 20 '22 at 20:59

answered Sep 20 '22 at 13:37

Mike Cargal

6,610
3
21
27

In the grammar in [the Antlr4 grammars repo](https://github.com/antlr/grammars-v4/), `INTEGER` is defined as `DIGIT+`, so that part is OK. At the risk of being judged judgemental, I'd say that the fact that `DIGIT` is not declared to be a fragment is a flaw for an example grammar. – rici Sep 20 '22 at 16:07
Also, there is most certainly another, more reasonable, interpretation of `1.-1`: a subtraction. The fact that `signed` is not a token means that the grammar would also allow `1. -1` as a `number` although I suppose rule ordering resolves the ambiguity. Again, I would say "flawed" (which means "imperfect", not "unusable"). And there is also the unnecessary lookahead... – rici Sep 20 '22 at 16:17

score 0 · Answer 2 · answered Sep 20 '22 at 13:56

Your analyze looks correct to me saying that :

1.-1 is recognized as a number
a branch with unsigned_ could fix it

Saying it's "flawd" taste like a value judgement, which seems not relevant.

If that was for my own usage, I would prefer to :

recognize 0.-4 as an invalid number
recognize -.4 as a valid number

So I do prefer something like :

number
   : signed_float('e' signed_integer)?
   ;

signed_float
   : PLUSMIN? unsigned_float
   ;

unsigned_float
   : integer
   | (integer '.')
   | ('.' integer)
   | (integer'.' integer)
   ;

signed_integer
   : PLUSMIN? unsigned_integer
   ;

PLUSMIN
   : '+'
   | '-'
   ;

Is this just a flawed grammar?

2 Answers2

Linked