25

I'm trying to make a parser for a simple functional language, a bit like Caml, but I seem to be stuck with the simplest things.

So I'd like to know if there are some more complete examples of parsec parsers, something that goes beyond "this is how you parse 2 + 3". Especially function calls in terms and suchlike.

And I've read "Write you a Scheme", but the syntax of scheme is quite simple and not really helping for learning.

The most problems I have is how to use try, <|> and choice properly, because I really don't get why parsec never seems to parse a(6) as a function call using this parser:

expr = choice [number, call, ident]

number = liftM Number float <?> "Number"

ident = liftM Identifier identifier <?> "Identifier"

call = do
    name <- identifier
    args <- parens $ commaSep expr
    return $ FuncCall name args
    <?> "Function call"

EDIT Added some code for completion, though this is actually not the thing I asked:

AST.hs

module AST where

data AST
    = Number Double
    | Identifier String
    | Operation BinOp AST AST
    | FuncCall String [AST]
    deriving (Show, Eq)

data BinOp = Plus | Minus | Mul | Div
    deriving (Show, Eq, Enum)

Lexer.hs

module Lexer (
            identifier, reserved, operator, reservedOp, charLiteral, stringLiteral,
            natural, integer, float, naturalOrFloat, decimal, hexadecimal, octal,
            symbol, lexeme, whiteSpace, parens, braces, angles, brackets, semi,
            comma, colon, dot, semiSep, semiSep1, commaSep, commaSep1
    ) where

import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellStyle)

lexer = P.makeTokenParser haskellStyle

identifier = P.identifier lexer
reserved = P.reserved lexer
operator = P.operator lexer
reservedOp = P.reservedOp lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
natural = P.natural lexer
integer = P.integer lexer
float = P.float lexer
naturalOrFloat = P.naturalOrFloat lexer
decimal = P.decimal lexer
hexadecimal = P.hexadecimal lexer
octal = P.octal lexer
symbol = P.symbol lexer
lexeme = P.lexeme lexer
whiteSpace = P.whiteSpace lexer
parens = P.parens lexer
braces = P.braces lexer
angles = P.angles lexer
brackets = P.brackets lexer
semi = P.semi lexer
comma = P.comma lexer
colon = P.colon lexer
dot = P.dot lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer

Parser.hs

module Parser where

import Control.Monad (liftM)
import Text.Parsec
import Text.Parsec.String (Parser)
import Lexer
import AST

expr = number <|> callOrIdent

number = liftM Number float <?> "Number"

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)
Lanbo
  • 15,118
  • 16
  • 70
  • 147
  • The specific question should be easy to answer, but I'd prefer to try with a full, compilable code sample demonstrating your problem... could you provide one? – sclv Nov 21 '11 at 21:37
  • I do note, however, that you don't use `try` anywhere. In your minimal example, I'm not sure if it matters, but in any larger sample it certainly would. – sclv Nov 21 '11 at 21:38
  • Trying to provide my whole program so far. – Lanbo Nov 21 '11 at 21:58

2 Answers2

10

Hmm,

*Expr> parse expr "" "a(6)"
Right (FuncCall "a" [Number 6.0])

that part works for me after filling out the missing pieces.

Edit: I filled out the missing pieces by writing my own float parser, which could parse integer literals. The float parser from Text.Parsec.Token on the other hand, only parses literals with a fraction part or an exponent, so it failed parsing the "6".

However,

*Expr> parse expr "" "variable"
Left (line 1, column 9):
unexpected end of input
expecting "("

when call fails after having parsed an identifier, that part of the input is consumed, hence ident isn't tried, and the overall parse fails. You can a) make it try call in the choice list of expr, so that call fails without consuming input, or b) write a parser callOrIdent to use in expr, e.g.

callOrIdent = do
    name <- identifier
    liftM (FuncCall name) (parens $ commaSep expr) <|> return (Identifier name)

which avoids try and thus may perform better.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • I use the lexer functions of `Text.Parsec.Token` for `identifier`, etc. For some reason, I get entirely different parsing results for the code you give me. – Lanbo Nov 21 '11 at 21:54
  • @Scán ah, the `float` parser of `Token` doesn't parse integer literals, you'd have to write `a(6.0)` or similar. But apart from that, it behaves like the above, more or less. – Daniel Fischer Nov 21 '11 at 22:13
  • ... I feel dumb now. Thanks! (I'd still wish for those full examples) – Lanbo Nov 21 '11 at 22:16
  • 1
    @LambdaDusk Why is in this example always an empty "" preceding the actual input text? – J Fritsch Nov 23 '12 at 00:02
  • 1
    @JFritsch That's the source name. If you have a parse error, and the source name is not empty, it's reported, so you get e.g. a parse error reported as `Left "Somefile.csv" (line 14, column 5): unexpected "1" expecting letter or ","`. With an empty source name, that is omitted from the report. – Daniel Fischer Nov 23 '12 at 00:12
  • How can you allow only (pre)defined function call names? – J Fritsch Nov 23 '12 at 00:18
  • You mean the parser should only accept certain names and fail for others? You can parse the name, and look it up in a set of allowed names (maintained as part of the state, for example), rejecting it if it is not found. Or, if the allowed names are known when writing the code, `choice [try $ string functionName | functionName <- allowedNames]` would be simpler. – Daniel Fischer Nov 23 '12 at 00:23
  • @JFritsch You can define them as `reservedNames` in the [lexer](http://hackage.haskell.org/packages/archive/parsec/3.1.2/doc/html/Text-Parsec-Token.html) and then parse them using `reserved`. But that would mean you have to write a parsing code for every single of them, like you do for `if` and `for`. – Lanbo Nov 23 '12 at 00:24
  • Why does data Resname mine | alsomine | yours ... and have data AST | FuncCall Resname [AST] not do this job? – J Fritsch Nov 23 '12 at 00:40
  • @JFritsch (Aside: The value constructors need start with an upper case letter.) You'd still need to write a parser for `Resname`, one string for every constructor. That wouldn't really simplify things versus using `String` for the function names. – Daniel Fischer Nov 23 '12 at 00:46
2

I wrote up a series of examples on how to parse Roman Numerals with parsec. It's pretty basic but you or other newcomers may find it useful:

https://github.com/russell91/roman

RussellStewart
  • 5,293
  • 3
  • 26
  • 23