11

I've decided to check out FParsec, and tried to write a parser for λ expressions. As it turns out, eagerness makes recursive parsing difficult. How can I solve this?

Code:

open FParsec

type λExpr =
    | Variable of char
    | Application of λExpr * λExpr
    | Lambda of char * λExpr

let rec FV = function
    | Variable v -> Set.singleton v
    | Application (f, x) -> FV f + FV x
    | Lambda (x, m) -> FV m - Set.singleton x

let Λ0 = FV >> (=) Set.empty

let apply f p =
    parse
        { let! v = p
          return f v }

let λ e =

    let expr, exprR = createParserForwardedToRef()

    let var = lower |> apply Variable

    let app = tuple2 expr expr
                 |> apply Application

    let lam = pipe2 (pchar 'λ' >>. many lower)
                        (pchar '.' >>. expr) (fun vs e ->
                                                List.foldBack (fun c e -> Lambda (c, e)) vs e)

    exprR := choice [
                    lam
                    app
                    var
                    (pchar '(' >>. expr .>> pchar ')')
                    ]

    run expr e

Thanks!

Ramon Snir
  • 7,520
  • 3
  • 43
  • 61

1 Answers1

10

As you pointed out, the problem is that your parser for application calls expr recursively and so there is an infinite loop. The parser needs to be written such that it always consumes some input and then decides what to do.

In case of lambda calculus, the tricky thing is recognizing an application and a variable because if you have input like x... then the first character suggests it could be either of them.

You can merge the rules for application and variable like this:

let rec varApp = parse {
  let! first = lower |> apply Variable
  let! res = 
    choice [ expr |> apply (fun e -> Application(first, e))
             parse { return first } ]
  return res }

This first parses a variable and then either parses another expression (in which case it is an application) or it just returns the variable (if there is no expression following the variable). The rest of the rules are similar:

and lam = 
  pipe2 (pchar 'λ' >>. many lower)
        (pchar '.' >>. expr) (fun vs e ->
    List.foldBack (fun c e -> Lambda (c, e)) vs e)
and brac = pchar '(' >>. expr .>> pchar ')'
and expr = parse.Delay(fun () ->
  choice 
    [ lam; varApp; brac ])

I just avoided the need for explicit mutation by using parse.Delay() (which makes it possible to create recursive value references). In principle, it could be written as:

and expr = parse {
  return! choice [ lam; varApp; brac ] }

...but for some reason, FParsec doesn't implement the ReturnFrom member that is needed if you want to use return! in computation expressions.

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • 4
    Thanks for bringing this to my attention. The omission of the ReturnFrom member from the 'parse' builder object is an oversight. In a previous version of F#, ReturnFrom was defined implicitly. (The definition is trivial.) This was supposed to be fixed in FParsec 0.9, but I forgot about it. I've just checked the fix into the BitBucket repository. – Stephan Tolksdorf Jun 01 '11 at 16:52
  • 1
    Personally I no longer use the computation expression syntax for constructing parsers due to the performance issues described here: http://www.quanttec.com/fparsec/users-guide/where-is-the-monad.html#why-the-monadic-syntax-is-slow – Stephan Tolksdorf Jun 01 '11 at 16:53
  • @Stephan Tolksdorf - The lack of `ReturnFrom` bothered me in the beginning but I have started avoiding the computation expression syntax because I noticed the parsers were much faster that way. I've just started getting used to your operator precedence parser which has given me an even bigger speed boost. – ChaosPandion Jun 01 '11 at 17:13
  • @Stephan - Thanks for the answer and for adding `ReturnFrom` :-). The points about performance are very interesting! Would it be possible to solve it using some combinator (e.g. in the style of `Seq.cache`)? – Tomas Petricek Jun 01 '11 at 20:01
  • @ChaosPandion - Nice to hear that the operator precedence parser speeds up your parser. If there's any thing in FParsec that bothers you besides the previously missing ReturnFrom, please let me know! – Stephan Tolksdorf Jun 03 '11 at 18:58
  • 2
    @Tomas Petricek - The equivalent to Seq.cache for FParsec would be a memoization combinator like the one described in http://www.quanttec.com/fparsec/users-guide/tips-and-tricks.html#memoizing-parsers However, such a combinator will only help performance in certain special situations where you have a frequently backtracking parser. To get rid of the overhead associated with the computation expression syntax you either need more advanced compiler optimizations or you need to use use meta-programming techniques to essentially perform these optimizations yourself. – Stephan Tolksdorf Jun 03 '11 at 19:00
  • 1
    Fortunately, the overhead associated with the computation expression syntax is usually negligible for the most important application of computation expressions in F#: async-expressions. (seq-expressions are a special case, since the F# compiler compiles them into state machines.) – Stephan Tolksdorf Jun 03 '11 at 19:07