8

In order to create better error messages in a later step I want to save the positions on which a parser succeeds as well as the text. Getting the positions seems pretty easy (since there is the getPosition parser), but I don't know how I can access the text.

Let's say I have this type to save the location

type SourceLocation = {
    from: Position
    to: Position
    text: string
}

and I want to create a function, which adds a SourceLocation to the result of another parser:

let trackLocation (parser: Parser<'A, 'B>): Parser<SourceLocation * 'A, 'B> =
    let mkLocation ((start: Position, data: 'A), stop: Position: 'Positon) =
        let location = { from = start; to = stop }  // how do I get the text?
        in (location, data)
    getPosition .>>. parser .>>. getPositon |>> mkLocation

Since parsers are just functions taking a CharStream I thought I can use the stream together with the Index from my locations to get the text, but I did not see a method to get this text.

So what is the correct way to get the text on which a parser succeeds?

danielspaniol
  • 2,228
  • 21
  • 38
  • 1
    Where is the parser's text coming from? Do you have a local copy of the parsed text in a string or a local file, or is it coming from a stream that you can't rewind? Because one possible way to solve your problem is to say "Hey, I have the start and stop positions, I'll just look up that part of the text". E.g., if your text is in a string variable called `inputText`, then you would just need `inputText.[start.Index .. stop.Index]` and you've got the matched text. Note that FParsec's `Position.Index` property is an int64, so you might need to cast to `int` if your input is 2^32 bytes or less. – rmunn Jun 25 '18 at 07:58
  • Or that might be `inputText.[start.Index .. stop.Index - 1]`: I haven't experimented with `getPosition` and I don't know if you'll have closed intervals or half-open intervals. Check for fencepost errors before you blindly apply my suggestion. – rmunn Jun 25 '18 at 07:59
  • I can get a copy of the text but this seems kinda hacky to me. Is there no way to do this with just parsers or the parsers streams? It would be optimal to have the text in the AST already so I can forget the input file after parsing – danielspaniol Jun 25 '18 at 08:03
  • I think the design of FParsec is such that you normally wouldn't deal with the text, e.g. the `sepBy` parser returns a `Parser<'a list, 'u>` and you don't have to deal with the original text. But I think the [`CharStream.BacktrackTo` method](http://www.quanttec.com/fparsec/reference/charstream.html#CharStream_1.members.BacktrackTo) might be what you need. Give me a second and I'll write up a possible approach. – rmunn Jun 25 '18 at 08:17
  • Turned out that `CharStream.ReadFrom` is exactly what you're looking for. You have to pass it a CharStreamState, not a Position, but apart from that it's quite handy. – rmunn Jun 25 '18 at 08:51
  • 4
    There are also the `skipped` and `withSkippedString` combinators: http://www.quanttec.com/fparsec/reference/parser-overview.html#parsing-strings-with-the-help-of-other-parsers – Stephan Tolksdorf Jun 25 '18 at 13:07
  • @StephanTolksdorf - Thanks; that's an even better suggestion than `CharStream.ReadFrom` since it allows staying at the "higher" combinator level of FParsec rather than dropping down to the "lower" level of CharStream and Reply objects. I've updated my answer to use `withSkippedString`. – rmunn Jun 26 '18 at 02:32

1 Answers1

6

I think what you probably want is the CharStream.ReadFrom method:

Returns a string with the chars between the index of the stateWhereStringBegins (inclusive) and the current Index of the stream (exclusive).

What you'd do is this:

let trackLocation (parser: Parser<'A, 'B>): Parser<SourceLocation * 'A, 'B> =
    fun (stream : CharStream<'B>) ->
        let oldState = stream.State
        let parseResult = parser stream
        if parseResult.Status = Ok then
            let newState = stream.State
            let matchedText = stream.ReadFrom (oldState, true)
            // Or (oldState, false) if you DON'T want to normalize newlines
            let location = { from = oldState.GetPosition stream
                             ``to`` = newState.GetPosition stream
                             text = matchedText }
            let result = (location, parseResult.Result)
            Reply(result)
        else
            Reply(parseResult.Status, parseResult.Error)

Usage example (which also happens to be the test code that I wrote to confirm that it works):

let pThing = trackLocation pfloat
let test p str =
    match run p str with
    | Success((loc, result), _, _)   -> printfn "Success: %A at location: %A" result loc; result
    | Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg; 0.0
test pThing "3.5"
// Prints: Success: 3.5 at location: {from = (Ln: 1, Col: 1);
//                                    to = (Ln: 1, Col: 4);
//                                    text = "3.5";}

Edit: Stephan Tolksdorf (the author of FParsec) pointed out in a comment that the withSkippedString combinator exists. That one will probably be simpler, as you don't have to write the CharStream-consuming function yourself. (The skipped combinator would return the string that the parser matched, but without returning the parser's result, whereas withSkippedString passes both the parser's result and the string skipped over into a function that you supply). By using the withSkippedString combinator, you can use your original trackLocation function with only minimal changes. The updated version of trackLocation would look like this:

let trackLocation (parser: Parser<'A, 'B>): Parser<SourceLocation * 'A, 'B> =
    let mkLocation ((start: Position, (text: string, data: 'A)), stop: Position) =
        let location = { from = start; ``to`` = stop; text = text }
        in (location, data)
    getPosition .>>. (parser |> withSkippedString (fun a b -> a,b)) .>>. getPosition |>> mkLocation

(I'm not 100% happy with the arrangement of the tuples here, since it results in a tuple within a tuple within a tuple. A different combinator order might yield a nicer signature. But since it's an internal function not intended for public consumption, a nasty tuple-nesting in the function signature may not be a big deal, so I've left it as is. Up to you to rearrange it if you want a better function signature).

The same test code from my original answer runs fine with this updated version of the function, and prints the same result: start position (Line 1, Col 1), end position (Line 1, Col 4), and parsed text "3.5".

rmunn
  • 34,942
  • 10
  • 74
  • 105