2

I am trying to parse a list of tokens with FParsec, where each token is either a block of text or a tag - for example:

This is a {type of test} test, and it {succeeds or fails}

Here is the parser:

type Parser<'t> = Parser<'t, unit>

type Token =
| Text of string
| Tag of string

let escape fromString toString : Parser<_> =
    pstring fromString |>> (fun c -> toString)

let content : Parser<_> =
    let contentNormal = many1Satisfy (fun c -> c <> '{' && c <> '}')
    let openBraceEscaped = escape "{{" "{"
    let closeBraceEscaped = escape "}}" "}"
    let contentEscaped = openBraceEscaped <|> closeBraceEscaped
    stringsSepBy contentNormal contentEscaped

let ident : Parser<_> =
    let isIdentifierFirstChar c = isLetter c || c = '_'
    let isIdentifierChar c = isLetter c || isDigit c || c = '_'
    spaces >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier" .>> spaces

let text = content |>> Text

let tag = 
    ident |> between (skipString "{") (skipString "}")
    |>> Tag

let token = text <|> tag
let tokens = many token .>>. eof   

the following tests work:

> run token "abc def" ;;
val it : ParserResult<Token,unit> = Success: Text "abc def"

> run token "{abc def}" ;;
val it : ParserResult<Token,unit> = Success: Tag "abc def"

but trying to run tokens results in an exception:

> run tokens "{abc} def" ;;
System.InvalidOperationException: (Ln: 1, Col: 10): The combinator 'many' was 
    applied to a parser that succeeds without consuming input and without
    changing the parser state in any other way. (If no exception had been raised,
    the combinator likely would have entered an infinite loop.)

I've gone over this stackoverflow question but nothing I've tried works. I even added the following, but I get the same exception:

let tokenFwd, tokenRef = createParserForwardedToRef<Token, unit>()
do tokenRef := choice [tag; text]
let readEndOfInput : Parser<unit, unit> = spaces >>. eof
let readExprs = many tokenFwd
let readExprsTillEnd = readExprs .>> readEndOfInput

run readExprsTillEnd "{abc} def"  // System.InvalidOperationException ... The combinator 'many' was applied  ...

I believe the problem is stringsSepBy in content, but I can't figure out any other way to get a string with the escaped items

Any help would be much appreciated - I have been going through this for a couple days now and can't figure it out.

Community
  • 1
  • 1
jjmac
  • 23
  • 3

2 Answers2

2

stringsSepBy accepts zero strings, causing token to accept an empty string, causing many to complain.

I changed it to the following to verify that that was the line you need to work on.

many1 (contentNormal <|> contentEscaped) |>> fun l -> String.concat "" l

Also I got away from stringsSepBy contentNormal contentEscaped, because that says you need to match contentNormals with contentEscapeds in between them. So a{{b}}c is ok, but {{b}}, {{b}}c and a{{b}} will fail.

jyoung
  • 5,071
  • 4
  • 30
  • 47
  • Thanks! just piping to String.concat works perfectly (`many1 (contentNormal <|> contentEscaped) |>> String.Concat`), but I would like to see if I can get notEmpty to work correctly too – jjmac May 19 '14 at 03:09
1

notEmpty can be used to consume input. If you're not consuming any input but letting the parser succeed then the "current position" of the parser is not moved forward, so when a statement doing that is inside a many it would go into an infinite loop without that exception. stringsSepBy is succeeding and parsing zero elements, you could use notEmpty to fail it if it gets zero elements:

stringsSepBy contentNormal contentEscaped |> notEmpty

Also, I tried to get your full example to parse, the tags can include spaces so you need to allow ident to include spaces to match that:

let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = ' '

Another little adjustment would be to only return a Token list rather than Token list * unit tuple (unit is the result of eof):

let tokens = many token .>> eof  
Matthew Mcveigh
  • 5,695
  • 22
  • 22
  • Thank you for the help! I wonder if I have the brace escaping coded correctly - when I pipe to notEmpty, escaped bracing only works if it's not at the beginning or end of the string. For example, `"a{{b}}c"` parses, but if I remove either a or c then I get an error. Any advice? – jjmac May 19 '14 at 03:04
  • `stringsSepBy` doesn't make sense in that case. If you think about it you're really looking for many strings that can be either escaped braces or normal content, rather than normal content separated by escaped braces. So jyoung's answer is the solution to this problem – Matthew Mcveigh May 19 '14 at 06:22