0

I was able to add support to my parser's grammar for alternating characters (e.g. ababa or baba) by following along with this question.

I'm now looking to extend that by allowing repeats of characters.

For example, I'd like to be able to support abaaabab and aababaaa as well. In my particular case, only the a is allowed to repeat but a solution that allows for repeating b's would also be useful.

Given the rules from the other question:

expr ::= A | B
A ::= "a" B | "a"
B ::= "b" A | "b"

... I tried extending it to support repeats, like so:

expr ::= A | B

# support 1 or more "a"
A_one_or_more = A_one_or_more "a" | "a"

A ::= A_one_or_more B | A_one_or_more
B ::= "b" A | "b"

... but that grammar is ambiguous. Is it possible for this to be made unambiguous, and if so could anyone help me disambiguate it?

I'm using the lemon parser which is an LALR(1) parser.

Community
  • 1
  • 1
JesseBuesking
  • 6,496
  • 4
  • 44
  • 89

2 Answers2

1

Here is a nice tool for checking your grammar online http://smlweb.cpsc.ucalgary.ca/start.html. It actually accepts the grammar you provided as a valid LALR(1) grammar.

A different LALR(1) grammar, that allows reapeating a's, would be:

S ::= "a" S | "a" | "b" A | "b"
A ::= "a" S .
r33tnup
  • 319
  • 2
  • 9
  • I attempted to simplify my grammar for this question, and apparently I simplified it too much since it is still ambiguous. The a's and b's in my example are actually the results of sub-rules, and somehow that causes my ambiguity. Thank you for your response! – JesseBuesking Dec 19 '16 at 18:55
1

The point of parsing, in general, is to parse; that is, determine the syntactic structure of an input. That's significantly different from simply verifying that an input belongs to a language.

For example, the language which consists of arbitrary repetitions of a and b can be described with the regular expression (a|b)*, which can be written in BNF as

S ::= /* empty */  | S a | S b

But that probably does not capture the syntactic structure you are trying to defind. On the other hand, since you don't specify that structure, it is hard to know.

Here are a couple more possibilities, which build different parse trees:

S ::= E | S E
E ::= A b | E b
A ::= a | A a

S ::= E | S E
E ::= A B
A ::= a | A a
B ::= b | B b

When writing a grammar to parse a language, it is useful to start by drawing your proposed parse trees. Usually, you can write the grammar directly from the form of the trees, which shows that a formal grammar is primarily a documentation tool, since it clearly describes the language in a way that informal descriptions cannot. Using a parser generator to turn that grammar into a parser ensures that the parser implements the described language. Or, at least, that is the goal.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thank you for your detailed response. It absolutely makes sense, however unfortunately my capabilities in this area are lacking so I'm having a tough time resolving my ambiguity. I'll keep hacking away at my particular problem and hopefully I'll be able to resolve it. Thank you again! – JesseBuesking Dec 19 '16 at 18:53
  • @JesseBuesking: You might get a better answer if you ask a more precise question. From your comment to the other answer, I gather that your grammar does not really look like the one you are asking about. However, I stand by my statement that your best bet is to draw some syntax diagrams in order to clarify your thinking. – rici Dec 19 '16 at 23:02