I am trying to get more familiar with megaparsec, and I am running into some issues with presedences. By 'nested data' in the title I refer to the fact that I am trying to parse types, which in turn could contain other types. If someone could explain why this does not behave as I would expect, please don't hesitate to tell me.
I am trying to parse types similar to those found in Haskell. Types are either base types Int
, Bool
, Float
or type variables a
(any lowercase word).
We can also construct algebraic data types from type constructors (Uppercase words) such as Maybe
and type parameters (any other type). Examples are Maybe a
and Either (Maybe Int) Bool
. Functions associate to the right and are constructed with ->
, such as Maybe a -> Either Int (b -> c)
. N-ary tuples are a sequence of types separated by ,
and enclosed in (
and )
, such as (Int, Bool, a)
. A type can be wrapped in parenthesis to raise its precedence level (Maybe a)
. A unit type ()
is also defined.
I am using this ADT to describe this.
newtype Ident = Ident String
newtype UIdent = UIdent String
data Type a
= TLam a (Type a) (Type a)
| TVar a Ident
| TNil a
| TAdt a UIdent [Type a]
| TTup a [Type a]
| TBool a
| TInt a
| TFloat a
I have tried to write a megaparsec
parser to parse such types, but I get unexpected results. I attach the relevant code below after which I will try to describe what I experience.
{-# LANGUAGE OverloadedStrings #-}
module Parser where
import AbsTinyCamiot
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as Lexer
import Text.Megaparsec.Debug
import Control.Applicative hiding (many, some, Const)
import Control.Monad.Combinators.Expr
import Control.Monad.Identity
import Data.Void
import Data.Text (Text, unpack)
type Parser a = ParsecT Void Text Identity a
-- parse types
pBaseType :: Parser (Type ())
pBaseType = choice [
TInt () <$ label "parse int" (pSymbol "Int"),
TBool () <$ label "parse bool" (pSymbol "Bool"),
TFloat () <$ label "parse float" (pSymbol "Float"),
TNil () <$ label "parse void" (pSymbol "()"),
TVar () <$> label "parse type variable" pIdent]
pAdt :: Parser (Type ())
pAdt = label "parse ADT" $ do
con <- pUIdent
variables <- many $ try $ many spaceChar >> pType
return $ TAdt () con variables
pType :: Parser (Type ())
pType = label "parse a type" $
makeExprParser
(choice [ try pFunctionType
, try $ parens pType
, try pTupleType
, try pBaseType
, try pAdt
])
[]--[[InfixR (TLam () <$ pSymbol "->")]]
pTupleType :: Parser (Type ())
pTupleType = label "parse a tuple type" $ do
pSymbol "("
fst <- pType
rest <- some (pSymbol "," >> pType)
pSymbol ")"
return $ TTup () (fst : rest)
pFunctionType :: Parser (Type ())
pFunctionType = label "parse a function type" $ do
domain <- pType
some spaceChar
pSymbol "->"
some spaceChar
codomain <- pType
return $ TLam () domain codomain
parens :: Parser a -> Parser a
parens p = label "parse a type wrapped in parentheses" $ do
pSymbol "("
a <- p
pSymbol ")"
return a
pUIdent :: Parser UIdent
pUIdent = label "parse a UIdent" $ do
a <- upperChar
rest <- many $ choice [letterChar, digitChar, char '_']
return $ UIdent (a:rest)
pIdent :: Parser Ident
pIdent = label "parse an Ident" $ do
a <- lowerChar
rest <- many $ choice [letterChar, digitChar, char '_']
return $ Ident (a:rest)
pSymbol :: Text -> Parser Text
pSymbol = Lexer.symbol pSpace
pSpace :: Parser ()
pSpace = Lexer.space
(void spaceChar)
(Lexer.skipLineComment "--")
(Lexer.skipBlockComment "{-" "-}")
This might be overwhelming, so let me explain some key points. I understand that I have a lot of different constructions that could match on an opening parenthesis, so I've wrapped those parsers in try
, such that if they fail I can try the next parser that might consume an opening parenthesis. Perhaps I am using try
too much? Does it affect performance to potentially backtrack so much?
I've also tried to make an expression parser by defining some terms and an operator table. You can see now that I've commented out the operator (function arrow), however. As the code looks right now I loop infinitely when I try to parse a function type. I think this might be due to the fact that when I try to parse a function type (invoked from pType
) I immediately try to parse a type representing the domain of the function, which again call pType
. How would I do this correctly?
If I decide to use the operator table instead, and not use my custom parser for function types, I parse things using wrong precedences. E.g Maybe a -> b
gets parsed as Maybe (a -> b)
, while I would want it to be parsed as (Maybe a) -> b
. Is there a way where I could use the operator table and still have type constructors bind more tightly than the function arrow?
Lastly, as I am learning megaparsec as I go, if anyone sees any misunderstandings or things that are wierd/unexpected, please tell me. I've read most of this tutorial to get my this far.
Please let me know of any edits I can make to increase the quality of my question!