2

I'm trying to parse SQL search conditions and having trouble getting the parser to differentiate logical (AND, OR) from other infix operators. I'm parsing them as different nodes (perhaps that's difficult to do), but simplifies the evaluation phase. Here's the relevant code snippet (I can include more if necessary).

let opp = OperatorPrecedenceParser<_,_,_>()
let scalarExpr = opp.ExpressionParser
opp.TermParser <- constant <|> id <|> between lparen rparen scalarExpr <|> scalarExpr

//infix operators added here

let comparison = //(e.g., 1 < 2)
  let compareExpr = pipe3 scalarExpr compareOp scalarExpr (fun l op r -> Comparison(op, l, r))
  between lparen rparen compareExpr <|> compareExpr

let andTerm = pstringCI "and" .>> ws
let orTerm = pstringCI "or" .>> ws

let searchCondition, searchConditionRef = createParserForwardedToRef()
searchConditionRef := 
  [ comparison 
    pipe3 searchCondition andTerm searchCondition (fun l _ r -> And(l, r))
    pipe3 searchCondition orTerm searchCondition (fun l _ r -> Or(l, r))
    between lparen rparen searchCondition ]
  |> choice

let filter : Parser<_,unit> = ws >>. searchCondition .>> eof

"1 = 1" correctly parses to Comparison (Eq,Constant (Int32 1),Constant (Int32 1))

but once I try to join two comparisons with a logical operator, e.g., "1 = 1 or 2 = 2", it fails to parse with

Error in Ln: 1 Col: 7
1 = 1 or 2 = 2
         ^
Expecting: end of input or infix operator
: 7

I expected it to parse the 1 before the error as a scalar expression and upon hitting or backtrack, realizing it's not an infix operator, return 1 as the complete scalar, and recognize it's parsing the left-hand side of a condition joined by logical operator or.

Instead, it seems to continue assuming 1 begins a more complex scalar expression, possibly involving an infix operator.

Is there a problem with the code, or is the solution to parse AND/OR as infix operators (using the same OperatorPrecedenceParser)? I'd rather not go that route, so I'm hoping I've made a simple mistake somewhere.

The complete code is on gist.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Daniel
  • 47,404
  • 11
  • 101
  • 179
  • It's not backtracking by default. I guess changing `comparison` to `attempt comparison` in `searchConditionRef` will make the parser work correctly on your example. – pad Feb 09 '12 at 18:00
  • @pad: I tried that, but same error. – Daniel Feb 09 '12 at 18:04
  • Sorry for not reading carefully, you have to put `attempt` on the second case of And parser also. Your example use `or`; it matches the third case in `searchConditionRef`. – pad Feb 09 '12 at 18:06
  • I tried putting `attempt` on every case in `searchConditionRef`--no change. It seems odd to me to use `attempt` inside `choice` to force backtracking. – Daniel Feb 09 '12 at 18:07
  • @pad: I put the complete code on gist, if that helps. The link is in my question. – Daniel Feb 09 '12 at 18:30
  • @Daniel - congratulations on reaching the 10k mark. I thought I was going to beat you to it but got real busy the past few months ;) – Stephen Swensen Feb 09 '12 at 18:56
  • @StephenSwensen: you're not that far; being more active could help :) – pad Feb 09 '12 at 19:13
  • @StephenSwensen: Ha! Thanks. I noticed you hadn't been around as much lately. – Daniel Feb 09 '12 at 19:20

3 Answers3

4

I think ultimately you'll find you need to treat and and or as infix operators with precedence rules because that is exactly what they are and is the reason why most parsers including fparsec and fsyacc have special features to handle them (i.e. resolve ambiguity through precedence and associativity rules).

You've found one case highlighting this, but consider another:

1 = 1 or 2 = 2 and 3 =3

should that parse as (1 = 1 or 2 = 2) and 3 = 3 or 1 = 1 or (2 = 2 and 3 = 3)?

Stephen Swensen
  • 22,107
  • 9
  • 81
  • 136
  • If this turns out to be the case, I'm not sure how to restructure the grammar. Obviously logical ops are not allowed in the same places as math ops. Do you know of any example FParsec parsers that address this? Google hasn't turned up anything yet. Btw, I have a working fslex/flyacc implementation. I was trying to replicate it, but a symmetric translation may be impossible. – Daniel Feb 09 '12 at 22:06
  • Right, I think this would be better solved in a semantic analysis phase rather than trying to build the semantics of numeric versus logic ops into the grammar (e.g. http://code.google.com/p/nl-compiler/source/browse/NL/SemanticAnalysis.fs#760). While you may be able to get away with it for simple scalar expressions (I can imagine in fslex/fsyacc, but don't have enough FParsec experience), you will run into trouble with complex expressions where the overall type of the expression is determine by recursively analyzing sub-expressions. – Stephen Swensen Feb 09 '12 at 22:48
  • Great, now I'm on my way to 10k! – Stephen Swensen Feb 09 '12 at 22:52
  • I think I found a solution. The only drawback is it makes logical operators case-sensitive. – Daniel Feb 09 '12 at 23:30
2

Your parser stops after the first equation, because the choice combinator of the searchCondition applies the first argument parser comparison to the input and upon success simply returns the result of the argument parser. You then get an error because filter fails to parse the eof after the searchCondition.

The choice and <|> combinators do not implement a longest match rule and they do not backtrack after an error, as explained in the tutorial. So your searchCondition parser can't work.

Another problem is that your searchCondition parser is left-recursive, since the second and third choice arguments will try to apply searchCondition again without previously consuming any input. Left-recursion will lead to a stack overflow.

Similary, having <|> scalarExpr at the end of the opp.TermParser definition is unnecessary and can lead to infinite recursions.

When you translate a left-recursive parser grammar to FParsec, you need to eliminate the left-recursion.

One way to fix the searchCondition parser is to factor out the left-hand-side expression:

let andTerm = stringCIReturn "and" (fun l r -> And(l, r)) .>> ws
let orTerm = stringCIReturn "or" (fun l r -> Or(l, r)) .>> ws

let searchCondition, searchConditionRef = createParserForwardedToRef()

do searchConditionRef:=
    let comparisonTerm =
        comparison <|> between lparen rparen searchCondition

    pipe2 comparisonTerm (many ((andTerm <|> orTerm) .>>. comparisonTerm)) 
          (fun l opRList -> 
                List.fold (fun l (op, r) -> op l r) l opRList)

Or even simpler:

do searchConditionRef:= 
    chainl1 (comparison <|> between lparen rparen searchCondition)
            (andTerm <|> orTerm)        

Update: In the grammar there's also a problem with the parens parsing, see the comments below.

Stephan Tolksdorf
  • 3,062
  • 21
  • 28
  • Using this code, it doesn't make it past the first infix operator, but it's possible I've changed something elsewhere that causes it to not work. Is there anything wrong with using a separate `OperatorPrecedenceParser` as shown in my answer? – Daniel Feb 10 '12 at 15:46
  • It does work for me: http://pastebin.com/i7JFJWJE An `OperatorPrecendenceParser` just for two operators might be a bit overkill... Anyway, I mainly wrote this reply to point out why the previous implementation couldn't work. It's really important to understand the behaviour of `<|>` and `choice` and that you can't directly use left-recursion in an FParsec parser. – Stephan Tolksdorf Feb 10 '12 at 16:15
  • Thanks for the detailed explanation; it really helps. While your code parses `1 = 1 or 2 = 2`, wrapping it in parens throws it off. – Daniel Feb 10 '12 at 17:13
  • Yes, the nested but not mutally recursive expression grammars make the parens parsing a bit nasty here. The problem is that when the parser sees an opening bracket at certain locations, it doesn't yet know whether the parenthesized expression needs to be parsed as a `scalarExpr`, `comparison` or `searchCondition`. To be able to parse such expressions, we have to introduce some limited backtracking for parser errors after opening brackets and before closing brackets, so that the parser can tentatively parse a parenthesized expression with one subgrammar and, if necessary, parse again with ... – Stephan Tolksdorf Feb 11 '12 at 11:41
  • ... a different grammar. See the updated code at http://pastebin.com/i7JFJWJE With the usual expression grammars, where any parenthesized (top-level) expression is valid everywhere where a leaf term is valid, parsing is obviously simpler, because you just need to deal with parens at one place in the grammar. This is another argument for just using a single `OperatorPrecedenceParser`, as Stephen Swensen suggested. However, you'll have to annotate the AST with source locations if you want to be able to generate good error messages after parsing. – Stephan Tolksdorf Feb 11 '12 at 11:49
1

Creating a separate OperatorPrecedenceParser for logical operators seems to have fixed it.

I replaced

let andTerm = pstringCI "and" .>> ws
let orTerm = pstringCI "or" .>> ws

let searchCondition, searchConditionRef = createParserForwardedToRef()
searchConditionRef := 
  [ comparison 
    pipe3 searchCondition andTerm searchCondition (fun l _ r -> And(l, r))
    pipe3 searchCondition orTerm searchCondition (fun l _ r -> Or(l, r))
    between lparen rparen searchCondition ]
  |> choice

with

let condOpp = OperatorPrecedenceParser()
let searchCondition = condOpp.ExpressionParser
condOpp.TermParser <- (attempt comparison) <|> between lparen rparen searchCondition <|> searchCondition
condOpp.AddOperator(InfixOperator("or", ws, 1, Assoc.Left, fun l r -> Or(l, r)))    
condOpp.AddOperator(InfixOperator("and", ws, 2, Assoc.Left, fun l r -> And(l, r)))    

(1 = 1 or 2 = 2) and 3 = 3 and 1 = 1 or (2 = 2 and 3 = 3) parse correctly.

Daniel
  • 47,404
  • 11
  • 101
  • 179