How do I parse these operations?

Question

I have a data type

data S = Fac S | Mul S S | Nat Integer
deriving (Show)

and a grammar defined as

S ::= S "!" | S "*" S | natural

So far I have this written

pa :: Parser S
pa = facst <|> multst <|> natst

facst = do
    s <- pa
    char '!'
    return (Fac s)

multst = do
    s1 <- pa
    char '*'
    s2 <- pa
    return (Mul s1 s2)

natst = do
    n <- natural
    return (Nat n)

But facst and multst doesn't work. And natst only works with single integers like "5" but not "56". I've tried doing

natst = do
    n <- some natural
    return (Nat n)

but that creates an error. Could someone point me in the right direction?

Try to have a look at this one https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/parsing-floats-with-parsec — keep_learning, Aug 16 '19 at 04:06
@Dair but if I have, natural = fmap read (some (satisfy isDigit)) <* whitespaces, shouldn't it already be parsing multiple for natural numbers? — all, Aug 16 '19 at 04:07
`pa` calls `facst` first; `facst` calls `pa` first. Infinite recursion. — melpomene, Aug 16 '19 at 06:43
Your grammar is not a LL(1) grammar. You should perform some factoring on it before writing a parser. Doing that, you should probably modify it so to remove the ambiguities. — chi, Aug 16 '19 at 07:57
The grammar you present here is "*left recursive*", you can not parse such grammars with LL(1)". It might be better to use a LALR parser instead. First you will have to make your grammer unambiguous. — Willem Van Onsem, Aug 16 '19 at 10:00

score 2 · Answer 1 · answered Aug 16 '19 at 11:22

As was suggested in several comments, your grammar is ambiguous and contains left-recursion and so is a problem for recursive-descent parsers like most parser combinator libraries generate. See e.g. the Q&As How to remove ambiguity in the following grammar? and Left recursion elimination for how to do this.

In your case,

S ::= S "!" | S "*" S | natural

has left-recursion because both the "!" and the "*" production begin with an S. It is ambiguous because it is unclear which of the S's should be derived first in the "*" production: Given an expression like 1 * 2 * 3, should this produce

  *                  *
 / \                / \
1   *      or      *   3  ?
   / \            / \
  2   3          1   2

1 * (2 * 3)      (1 * 2) * 3

It is also ambiguous, as @melpomene points out, because 1 * 2 ! could produce

  *            !
 / \           |
1   !    or    *
    |         / \
    2        1   2

1 * (2 !)    (1 * 2) !

An example of a rewritten grammar (there are others) that has neither left-recursion or ambiguities is:

S  ::= natural S₁
S₁ ::= "!" | "*" S | ε

With this grammar, 1 * 2 * 3 will always parse as

S
1 S₁
1 * S
1 * 2 S₁
1 * 2 * S
1 * 2 * 3 S₁
1 * 2 * 3 ε

meaning * becomes right-associative. And 1 * 2 ! will always parse as

S
1 S₁
1 * S
1 * 2 S₁
1 * 2 !

meaning ! gets a higher precedence than *, which I don't know is good or bad.

Either way, if you want the parser to express arbitrary expressions, you probably want to extend the grammar with explicit parentheses so that you can override the default precedence of each operator.

As for the parser itself, you could model it directly on a rewritten grammar, e.g.:

parseS :: Parser S
parseS = do
  n <- natural
  f <- parseS1
  return (f n)

natural :: Parser S
natural = do
  n <- read <$> many1 digit
  return (Nat n)

parseS1 :: Parser (S -> S)
parseS1 = parseFac <|> parseMul <|> parseNat
  where
    parseFac = do
      char '!'
      return (\s -> Fac s)

    parseMul = do
      char '*'
      s2 <- parseS
      return (\s1 -> Mul s1 s2)

    parseNat = do
      eof -- ε
      return (\s -> s)

Then you'll have to deal with whitespace:

> parse parseS "" "1*2*3"
Right (Mul (Nat 1) (Mul (Nat 2) (Nat 3)))

> parse parseS "" "1 * 2 * 3"
Left (line 1, column 2):
unexpected ' '
expecting digit, "!", "*" or end of input

> parse parseS "" " 1*2*3"
Left (line 1, column 1):
unexpected " "
expecting digit

> parse parseS "" "1*2*3 "
Left (line 1, column 6):
unexpected ' '
expecting digit, "!", "*" or end of input

I'd consult a tutorial or a book on getting this part right.

Finally you may want to use some of the higher-level features of various parser combinator libraries like chainr1 or Text.Megaparsec.Expr's makeExprParser that try to deal with this kind of thing in a less bothersome way. Still, before using them, it is wise to understand how they're implemented by making parsers manually as an exercise like you're currently doing. For example, how would you transform the parser above so that "*" is left-associative or that "!" has a lower precedence?

How do I parse these operations?

1 Answers1