As was suggested in several comments, your grammar is ambiguous and contains left-recursion and so is a problem for recursive-descent parsers like most parser combinator libraries generate. See e.g. the Q&As How to remove ambiguity in the following grammar? and Left recursion elimination for how to do this.
In your case,
S ::= S "!" | S "*" S | natural
has left-recursion because both the "!"
and the "*"
production begin with an S
. It is ambiguous because it is unclear which of the S
's should be derived first in the "*"
production: Given an expression like 1 * 2 * 3
, should this produce
* *
/ \ / \
1 * or * 3 ?
/ \ / \
2 3 1 2
1 * (2 * 3) (1 * 2) * 3
It is also ambiguous, as @melpomene points out, because 1 * 2 !
could produce
* !
/ \ |
1 ! or *
| / \
2 1 2
1 * (2 !) (1 * 2) !
An example of a rewritten grammar (there are others) that has neither left-recursion or ambiguities is:
S ::= natural S₁
S₁ ::= "!" | "*" S | ε
With this grammar, 1 * 2 * 3
will always parse as
S
1 S₁
1 * S
1 * 2 S₁
1 * 2 * S
1 * 2 * 3 S₁
1 * 2 * 3 ε
meaning *
becomes right-associative. And 1 * 2 !
will always parse as
S
1 S₁
1 * S
1 * 2 S₁
1 * 2 !
meaning !
gets a higher precedence than *
, which I don't know is good or bad.
Either way, if you want the parser to express arbitrary expressions, you probably want to extend the grammar with explicit parentheses so that you can override the default precedence of each operator.
As for the parser itself, you could model it directly on a rewritten grammar, e.g.:
parseS :: Parser S
parseS = do
n <- natural
f <- parseS1
return (f n)
natural :: Parser S
natural = do
n <- read <$> many1 digit
return (Nat n)
parseS1 :: Parser (S -> S)
parseS1 = parseFac <|> parseMul <|> parseNat
where
parseFac = do
char '!'
return (\s -> Fac s)
parseMul = do
char '*'
s2 <- parseS
return (\s1 -> Mul s1 s2)
parseNat = do
eof -- ε
return (\s -> s)
Then you'll have to deal with whitespace:
> parse parseS "" "1*2*3"
Right (Mul (Nat 1) (Mul (Nat 2) (Nat 3)))
> parse parseS "" "1 * 2 * 3"
Left (line 1, column 2):
unexpected ' '
expecting digit, "!", "*" or end of input
> parse parseS "" " 1*2*3"
Left (line 1, column 1):
unexpected " "
expecting digit
> parse parseS "" "1*2*3 "
Left (line 1, column 6):
unexpected ' '
expecting digit, "!", "*" or end of input
I'd consult a tutorial or a book on getting this part right.
Finally you may want to use some of the higher-level features of various parser combinator libraries like chainr1
or Text.Megaparsec.Expr
's makeExprParser
that try to deal with this kind of thing in a less bothersome way. Still, before using them, it is wise to understand how they're implemented by making parsers manually as an exercise like you're currently doing. For example, how would you transform the parser above so that "*"
is left-associative or that "!"
has a lower precedence?