I'm stuck on a problem with writing a parser in Haskell that I hope someone can help out with!
It is a bit more complicated than my usual parser because there are two layers of parsing. First a language definition is parsed into an AST, then that AST is transformed into another parser that parses the actual language.
I have made pretty good progress so far but I'm stuck on implementing recursion in the language definition. As the language definition is transformed from AST into a parser in a recursive function, I can't work out how it can call itself if it doesn't exist yet.
I'm finding it a bit hard to explain my problem, so maybe an example will help.
The language definition might define that a language consists of three keywords in sequence and then optional recursion in brackets.
A B C ($RECURSE)
Which would be parsed into an AST like:
[Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]
The Many
is not really required for this example, but in my actual project, optional blocks can have multiple syntax elements in them so an Optional
would contain a Many
with n elements.
I would then want it to get transformed into a parser that parses strings like:
A B C
A B C (A B C)
A B C (A B C (A B C))
I've boiled down my project into the simplest possible example. You can see my TODO comment where I'm stuck trying to implement the recursion.
{-# LANGUAGE OverloadedStrings #-}
module Example
( runExample,
)
where
import Control.Applicative hiding (many, some)
import Data.Text (Text)
import Data.Void
import System.IO as SIO
import Text.Megaparsec hiding (State)
import Text.Megaparsec.Char (space1, string')
import qualified Text.Megaparsec.Char.Lexer as L
import Text.Megaparsec.Debug
import Text.Pretty.Simple (pPrint)
-- Types
type Parser = Parsec Void Text
data SyntaxAst = Keyword Text | Recurse | Optional SyntaxAst | Many [SyntaxAst]
-- Megaparsec Base Parsers
-- Space consumer - used by other parsers to ignore whitespace
sc :: Parser ()
sc =
L.space
space1
(L.skipLineComment "--")
(L.skipBlockComment "/*" "*/")
-- Runs a parser, then consumes any left over space with sc
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
-- Parses a string, then consumes any left over space with sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
-- Parses something between parentheses
inParens :: Parser a -> Parser a
inParens =
between
(symbol "(")
(symbol ")")
-- Transforms the AST into a parser
transformSyntaxExprToParser :: SyntaxAst -> Parser [Text]
transformSyntaxExprToParser (Many exprs) = dbg "Many" (createParser exprs)
transformSyntaxExprToParser (Keyword text) = dbg "Keyword" (pure <$> lexeme (string' text))
transformSyntaxExprToParser (Optional inner) = dbg "Optional" (option [] (try (inParens (transformSyntaxExprToParser inner))))
transformSyntaxExprToParser Recurse = dbg "Recurse" (pure ["TODO"]) -- TODO: How do I recurse here?
-- transformSyntaxExprToParser s Recurse = dbg "Recurse" (createParser s) -- Seems to work in the example, but in my actual application creates an infinite loop and freezes
-- Walks over the parser AST and convert it to a parser
createParser :: [SyntaxAst] -> Parser [Text]
createParser expressions =
do
foldr1 (liftA2 (<>)) (fmap transformSyntaxExprToParser expressions)
runExample :: IO ()
runExample = do
-- To make the example simple, lets cut out the language definition parsing and just define
-- it literally.
let languageParser = createParser [Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]
let run p = runParser p "" "A B C (A B C (A B C))"
let result = run languageParser
case result of
Left bundle -> SIO.putStrLn (errorBundlePretty bundle)
Right xs -> pPrint xs
A few things I've tried:
- Pass the original AST up to the
transformSyntaxExprToParser
function and callcreateParser
when theRecurse
token is encountered. This didn't work due to infinite loops. - Using mutable references like IORef/STRef to pass in a reference that is updated to reference the final parser once the transformation is finished. I couldn't work out how to thread the IO/ST monads into the parser transform function.
- State monads. I couldn't work out how to pass a reference through the state monad.
I hope that makes sense, let me know if I need to elaborate more. I can also push up my full project if it will help.
Thanks for reading!
Edit: I've made changes to my original example to demonstrate the infinite loop problem (integrating the excellent suggestions in the answer below) at https://pastebin.com/DN0JJ9BA