7

As an exercise, I'm implementing a parser for an exceedingly simple language defined in Haskell using the following GADT (the real grammar for my project involves many more expressions, but this extract is sufficient for the question):

data Expr a where
    I   :: Int -> Expr Int
    Add :: [Expr Int] -> Expr Int

The parsing functions are as follows:

expr :: Parser (Expr Int)
expr = foldl1 mplus
    [ lit 
    , add 
    ]   

lit :: Parser (Expr Int)
lit = I . read <$> some digit

add :: Parser (Expr Int)
add = do
  i0 <- expr
  is (== '+')
  i1 <- expr
  is <- many (is (== '+') *> expr)
  pure (Add (i0:i1:is))

Due to the left-recursive nature of the expression grammar, when I attempt to parse something as simple as 1+1 using the expr parser, the parser get stuck in an infinite loop.

I've seen examples of how to factor out left recursion across the web using a transformation from something like:

S -> S a | b

Into something like:

S -> b T
T -> a T

But I'm struggling with how to apply this to my parser.

For completeness, here is the code that actually implements the parser:

newtype Parser a = Parser
    { runParser :: String -> [(a, String)]
    }   

instance Functor Parser where
    fmap f (Parser p) = Parser $ \s ->
      fmap (\(a, r) -> (f a, r)) (p s)

instance Applicative Parser where
    pure a = Parser $ \s -> [(a, s)] 
    (<*>) (Parser f) (Parser p) = Parser $ \s ->
      concat $ fmap (\(f', r) -> fmap (\(a, r') -> (f' a, r')) (p r)) (f >

instance Alternative Parser where
    empty = Parser $ \s -> []
    (<|>) (Parser a) (Parser b) = Parser $ \s ->
      case a s of
        (r:rs) -> (r:rs)
        []     -> case b s of
                    (r:rs) -> (r:rs)
                    []     -> []

instance Monad Parser where
    return = pure
    (>>=) (Parser a) f = Parser $ \s ->
      concat $ fmap (\(r, rs) -> runParser (f r) rs) (a s)

instance MonadPlus Parser where
    mzero = empty
    mplus (Parser a) (Parser b) = Parser $ \s -> a s ++ b s 

char  = Parser $ \case (c:cs) -> [(c, cs)]; [] -> []
is p  = char >>= \c -> if p c then pure c else empty
digit = is isDigit
  • You may want to look at https://en.m.wikipedia.org/wiki/Operator-precedence_parser – dfeuer Sep 19 '15 at 02:48
  • Also, you might consider using `attoparsec` instead of rolling your own parsing framework. – dfeuer Sep 19 '15 at 02:49
  • 1
    @dfeuer, but then we'd be missing the purpose of the exercise! That operator precedence looks like a faily good solution though.. Ideally we can get it to work with this recursive descent parser. –  Sep 19 '15 at 02:50
  • 1
    Also note that `mplus` is generally expected to match `<|>`. – dfeuer Sep 19 '15 at 02:57
  • Fair about the point! However, you might consider testing parsers with `attoparsec` to make sure they're the problem and not your framework. Just a thought. – dfeuer Sep 19 '15 at 02:58
  • As for the immediate problem, I think you probably want to parse in a "right-associative" fashion. When parsing a sum, don't allow the left operand to be a sum. – dfeuer Sep 19 '15 at 03:03

1 Answers1

3

Suppose you want to parse non-parenthesized expressions involving literals, addition, and multiplication. You can do this by cutting down the list by precedence. Here's one way to do it in attoparsec, which should be pretty similar to what you'd do with your parser. I'm no parsing expert, so there might be some errors or infelicities.

import Data.Attoparsec.ByteString.Char8
import Control.Applicative

expr :: Parser (Expr Int)
expr = choice [add, mul, lit] <* skipSpace
-- choice is in Data.Attoparsec.Combinators, but is
-- actually a general Alternative operator.

add :: Parser (Expr Int)
add = Add <$> addList

addList :: Parser [Expr Int]
addList = (:) <$> addend <* skipSpace <* char '+' <*> (addList <|> ((:[]) <$> addend))

addend :: Parser (Expr Int)
addend = mul <|> multiplicand

mul :: Parser (Expr Int)
mul = Mul <$> mulList

mulList :: Parser [Expr Int]
mulList = (:) <$> multiplicand <* skipSpace <* char '*' <*> (mulList <|> ((:[]) <$> multiplicand))

multiplicand :: Parser (Expr Int)
multiplicand = lit

lit :: Parser (Expr Int)
lit = I <$> (skipSpace *> decimal)
dfeuer
  • 48,079
  • 5
  • 63
  • 167