I wrote a PEG parser generator just for fun (I will publish it on NPM some time), and thought it would be easy to add a randomised phrase generator on top of it. The idea is to automatically get correct phrases, given a grammar. So I set the following rules to generate strings from each type of parsers :
- Sequence
p1 p2 ... pn
: Generate a phrase for each subparser and return the concatenation. - Alternative
p1 | p2 | ... | pn
: Randomly pick a subparser and generate a phrase with it. - Repetition
p{n, m}
: Pick a numberx
in[n, m]
(or[n, n+2]
ism === Infinity
) and return a concatenation ofx
generated phrases fromp
. - Terminal : Just return the terminal literal.
When I take the following grammar :
S: NP VP
PP: P NP
NP: Det N | Det N PP | 'I'
VP: V NP | VP PP
V: 'shot' | 'killed' | 'wounded'
Det: 'an' | 'my'
N: 'elephant' | 'pajamas' | 'cat' | 'dog'
P: 'in' | 'outside'
It works great. Some examples :
my pajamas killed my elephant
an pajamas wounded my pajamas in my pajamas
an dog in I wounded my cat in I outside my elephant in my elephant in an pajamas outside an cat
I wounded my pajamas in my dog
This grammar has a recursion (PP: P NP
> NP: Det N PP
). When I take this other recursive grammar, for math expression this time :
expr: term (('+' | '-') term)*
term: fact (('*' | '/') fact)*
fact: '1' | '(' expr ')'
Almost one time in two, I get a "Maximum call stack size exceeded" error (in NodeJS). The other half of the time, I get correct expressions :
( 1 ) * 1 + 1
( ( 1 ) / ( 1 + 1 ) - ( 1 / ( 1 * 1 ) ) / ( 1 / 1 - 1 ) ) * 1
( ( ( 1 ) ) )
1
1 / 1
I guess the recursive production for fact
gets called too often, too deep in the call stack and this makes the whole thing just blow off.
How can I make my approach less naive in order to avoid those cases that explode the call stack ? Thank you.