6

The documentation for Parsec.Expr.buildExpressionParser says:

Prefix and postfix operators of the same precedence can only occur once (i.e. --2 is not allowed if - is prefix negate).

However, I would like to parse such strings.

Concretely, consider the following grammar:

sentence: 
    | identifier
    | "~" sentence
    | sentence & sentence
    | "!" sentence

Where operator precedence is: "~" binds stronger than "&" binds stronger than "!"

For example, I would like the sentence

! ~a & b

to be parsed as

! ( (~a) & b )

And the sentence

~ ! a & b 

as

~( ! ( a & b) )

Parsec allows me to do this (and specify the operator precedence), however, I would like to be able to chain prefixes, e.g. ~ ~ ! ~ a. Parsec does not allow this. I have found the solution for chaining prefixes, but this solution does not allow me to specify a different operator priority for the different prefix operators (either both "~" and "!" bind stronger than "&", or none of them does)

Does anyone have a solution for this?

Edit:

Partial solution that gets the operator bindings correct, but allows no chaining: http://lpaste.net/143362

Partial solution with chaining but that has a wrong binding for the "~" operator: http://lpaste.net/143364

Edit: Some more clarifications related to the latest answer.

I actually want & to be associative. Left or right does not matter. Left vs right associativity only matters between operators of the same precedence. For your examples, it is all resolved by noting that & binds stronger than ! (& has greater operator precedence)

Hence, the expression you were worried about:

a & ! b & c should become: (first bind & where possible) a & ! (b & c)

Similarly, ! a & ! b & c should be parsed (first bind &) ! a & ! (b & c), thus ! a & (! (b & c)), thus ! (a & (! (b & c)))

Community
  • 1
  • 1
BartBog
  • 1,889
  • 14
  • 28
  • Can you show your partial solution? I coded up something at http://lpaste.net/143362 which doesn't do chaining or repeated prefixes - it just tries to get the priorities right. – ErikR Oct 19 '15 at 14:28
  • I have two partial solutions acutally. One of them looks a lot like your coding and ignores the repeated prefixes. The other does the chaining, but gets the priorities wrong. (they are on their way) – BartBog Oct 19 '15 at 14:32
  • I added the partial solutions. To be precise, I recuperated one of yours and started from that one to get the other... – BartBog Oct 19 '15 at 14:37

4 Answers4

4

I wasn't satisfied with my original answer since it doesn't solve the general case of prefix and postfix operators at various precedences, and it requires the programmer to have to think about the grammar instead of just relying on buildExpressionParser to do the right thing.

I hunted around online and discovered the Pratt method for recursive descent parsing of expressions. I was able to implement a compact Haskell version that replaces buildExpressionParser. It has exactly the same interface as buildExpressionParser, but doesn't require you to use the chained prefix combinators or muck around with the term parser. I played around with your grammar, changing the associativity of &, and switching the prefix operators to postfix operators, and it all seems to work...

buildPrattParser table termP = parser precs where

  precs = reverse table

  prefixP = choice prefixPs <|> termP where
    prefixPs = do
      precsR@(ops:_) <- tails precs 
      Prefix opP <- ops
      return $ opP <*> parser precsR

  infixP precs lhs = choice infixPs <|> pure lhs where
    infixPs = do
      precsR@(ops:precsL) <- tails precs
      op <- ops
      p <- case op of
        Infix opP assoc -> do
          let p precs = opP <*> pure lhs <*> parser precs
          return $ case assoc of
            AssocNone  -> error "Non associative operators are not supported"
            AssocLeft  -> p precsL
            AssocRight -> p precsR
        Postfix opP ->
          return $ opP <*> pure lhs
        Prefix _ -> mzero
      return $ p >>= infixP precs

  parser precs = prefixP >>= infixP precs
pat
  • 12,587
  • 1
  • 23
  • 52
  • Note that this does not (yet) detect the use of left and right associative operators at the same precedence, or handle non associative operators. If your grammar does not include such operators then you should be good to go! – pat Nov 06 '15 at 15:24
2

One problem with my partial solution at http://lpaste.net/143362 is that it doesn't recognize ~ ! a.

However, if you change the operator table to:

table   = [ [ Prefix tilde ]
          , [ Infix amper AssocLeft ]
          , [ Prefix bang ]
          , [ Prefix tilde ]
          ]

it can parse that expression as well as ! ~a & b, ~ ! a & b correctly. Code at: http://lpaste.net/143370

So now combine this idea with your chaining and try:

table   = [ [ Prefix (chained tilde) ]
          , [ Infix amper AssocLeft ]
          , [ Prefix (chained bang) ]
          , [ Prefix (chained tilde) ]
          ]

chained  p = chainl1 p $ return (.)

Code at: http://lpaste.net/143371

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • Thanks a lot for the effort. However, there are still two problems with this solution: 1. It does not parse expressions of the form `! ~ ! a` 2. It parses `~ a & b` wrong (parses it as `~ (a & b)` instead of `(~a) & b` – BartBog Oct 20 '15 at 12:06
2

The left-factored grammar for the parser you want is:

sentence : '!' sentence
         | sentence1

sentence1 : sentence2 '&' sentence1
          | sentence2

sentence2 : '~' sentence2
          | term

term : '!' sentence
     | ident

Which can be rewritten in EBNF as:

sentence : '!'* sentence1

sentence1 : sentence2 ('&' sentence2)*

sentence2 : '~'* term

term : '!' sentence
     | ident

The parser generated by buildExpressionParser using chained prefix operators almost generates this parser, except that it doesn't include the ! rule in the term parser; hence the parse error when a ! is encountered after a ~.

Given the following:

{-# LANGUAGE NoMonomorphismRestriction #-}
module Main where

import Control.Monad
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.Char
import Control.Applicative ( (<*), (*>), (<*>), (<$), (<$>) )

data Sentence = Tilde Sentence
              | Bang Sentence
              | Amper Sentence Sentence
              | Ident String
  deriving ( Eq, Ord, Show )

bangP  = Bang  <$ lexeme (char '!')
amperP = Amper <$ lexeme (char '&')
tildeP = Tilde <$ lexeme (char '~')
identP = Ident <$> lexeme (many1 alphaNum)

lexeme = (<* spaces)

parser = spaces *> sentence <* eof

main = do
  let inputs = [ "a", "! a", "~ a", "a & b", "! a & b"
               , "~ a & b", "! ~ a & b", "~ ! a & b", "! ~ ! a"
               , "~ a & b", "a & ! b & c & d"
               ]
  forM_ inputs $ \input -> do
    putStr input
    putStr " -> "
    parseTest parser input

We can define the sentence parser by hand:

sentence = sentence0 where
  sentence0 = chainl bangP (return (.)) id <*> sentence1
  sentence1 = chainl1 sentence2 amperP
  sentence2 = chainl tildeP (return (.)) id <*> term
  term = (bangP <*> sentence0) <|> identP

or we can use buildExpressionParser if we add the ! rule into the term parser:

sentence = buildExpressionParser table term where
  table = [ [prefix tildeP]
          , [Infix amperP AssocLeft]
          , [prefix bangP]
          ]
  term = (bangP <*> sentence) <|> identP
  prefix  p = Prefix . chainl1 p $ return (.)
pat
  • 12,587
  • 1
  • 23
  • 52
  • This really looks like what I was looking for. Especially the buildExpressionParser method seems good. – BartBog Nov 03 '15 at 10:55
1

A new answer...

Have you thought of the associativity of the & operator?

Here is another idea I came up with assuming that & is right associative.

  1. Collect the sequence of prefix operators preceding a term.
  2. Parse the term (either an ident or a paren expression)
  3. Fix up the term by shifting over ~ operators from the sequence collected in step 1.
  4. If the next token is an &, the LHS of the amper operator is the fixed up term. The remaining operators are applied to amper expression.
  5. Otherwise the result is just the prefix operators applied to the term.

I believe associativity of the & matters, e.g. do we have:

a & ! b & c  -->   a & (! b & c)  --> a & ! (b & c)

or

a & ! b & c  -->   (a & (! b)) & c

Another case to think about is ! a & ! b & c - how do you want that parsed?

An implementation:

 {-# LANGUAGE NoMonomorphismRestriction, FlexibleContexts #-}

 import Text.Parsec
 import Control.Monad
 import Text.ParserCombinators.Parsec hiding (runParser, try)
 import Text.Parsec.Char

 data Sentence = Ident String | Bang Sentence | Tilde Sentence | Amper Sentence Sentence
   deriving (Show)

 lexer p = do x <- p; spaces; return x
 ident = lexer (many1 letter)
 sym ch  = lexer (char ch)

 tilde = sym '~'
 bang  = sym '!'
 amper = sym '&'

 parens p = between (sym '(') (sym ')') p

 term    =  parens expr 
          <|> (fmap Ident ident)
          <?> "simple expression"

 prefixOps = many (try tilde <|> bang)

 expr = do
   ops <- fmap reverse prefixOps
   lhs <- term

   let (ops', lhs') = popTildes ops lhs
       pre = mkPrefixNode ops'

   mrhs <- try (fmap Just (amper >> expr)) <|> (return Nothing)

   case mrhs of
     Nothing  -> return $ pre lhs'
     Just rhs -> return $ pre (Amper lhs' rhs)

 popTildes :: [Char] -> Sentence -> ([Char], Sentence)
 popTildes ('~':rest) s = popTildes rest (Tilde s)
 popTildes ops s        = (ops, s)

 mkPrefixNode :: [Char] -> (Sentence -> Sentence)
 mkPrefixNode [] = id
 mkPrefixNode ('~':rest) = mkPrefixNode rest . Tilde
 mkPrefixNode ('!':rest) = mkPrefixNode rest . Bang 
 mkPrefixNode _          = error "can't happen"

 check :: String -> IO ()
 check input = do
   let padded = input ++ (replicate (15-length input) ' ')
   case parse expr "-" input of
     Left e  -> do putStrLn $ "FAILED: " ++ input
                   putStrLn $ "  " ++ show e
     Right x -> do putStrLn $ "OK: " ++ padded ++ " -> " ++ show x

 inputs = [ "a", "! a", "~ a", "a & b", "! a & b", "~ a & b", "! ~ a & b"
          ,  "~ ! a", "! ~a & b", "~ ! a & b ", "! ~ ! a 2"
          ]

 main = mapM_ check inputs
ErikR
  • 51,541
  • 9
  • 73
  • 124
  • Thanks a lot. This is a very useful answer. This solution parses things as I wish. I notice that you're now basically doing all the parsing manually (no longer using BuildExpressionParser). I will still have to see whether I can integrate this in my actual example (which is a lot more complicated, with many more binary and unary operators -> this is the reason why I was trying to avoid building an expression parser myself) – BartBog Oct 21 '15 at 07:33
  • My feeling is that you can't use buildExpressionParser to do this. The parser it generates is very hierarchical and it doesn't perform any lookahead. For each precedence level it generates a parser where the terms are what the parser for the previous level recognizes and the operators are those for the current precedence level. Reading the code might be enlightening - it's actually a pretty simple idea: [(link)](https://hackage.haskell.org/package/parsec-3.1.9/docs/src/Text-Parsec-Expr.html#buildExpressionParser). Any, good luck, and I'd be interested to know what you come up with. – ErikR Oct 21 '15 at 08:29