7

I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >> is either one or two tokens (>> or > >), depending on the context. To make it even more complex, there is also a token >>= which is always the same regardless of the context.

punctuation :: Bool -> Parser Token
punctuation expectDoubleGT = do
    c <- oneOf "{}[]#()<>%;:.+-*/^&|~!=,"
    case c of
        '>' ->
            (char '=' >> return TokGTEq) <|>
            if expectDoubleGT
                then (string ">=" >> return TokRShiftEq) <|> return TokGT
                else (char '>' >> ((char '=' >> return TokRShiftEq) <|> return TokRShift)) <|> return TokGT

When expectDoubleGT is False, this function works fine. However, when expectDoubleGT is True (the second last line above), it gives an error when the input is >>.

*Parse> parseTest (punctuation True) ">"
TokGT
*Parse> parseTest (punctuation True) ">>="
TokRShiftEq
*Parse> parseTest (punctuation True) ">>"
parse error at (line 1, column 2):
unexpected end of input
expecting ">="

Why does the expression (string ">=" >> return TokRShiftEq) <|> return TokGT raise an error rather than returning TokGT when the input is >? (the first > was already consumed)

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

2 Answers2

11

Parsec only tries the second parser in

p1 <|> p2

if p1 failed without consuming any input. On The input ">>", after the first '>' has been consumed,

string ">="

fails after consuming the left over '>', so the second parser isn't used.

You need a try

try (string ">=" >> return TokRShiftEq)

there so that if string ">=" fails, no input is consumed and the alternative parser is used.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
-1

Use libclang. It can parse all of C++. No matter how hard you try, you won't be able to.

Demi
  • 3,535
  • 5
  • 29
  • 45
  • While this isn't a good answer to the question, it is a useful comment. Parsing C and C++ means you [should locally accept ambiguity](http://stackoverflow.com/questions/4172342/complexity-of-parsing-c), and I'm not sure whether Parsec can do that. –  Jun 20 '16 at 12:50