6

The problem

I came across a problem today and I do not know how to solve it. It is very strange to me, because the code I've written should (according to my current knowledge) is correct.

So below you can find a sample parser combinators. The most important one is pOperator, which in very simple way (only for demonstration purposes) builds an operator AST. It consumes "x" and can consume multiple "x" separated by whitespaces.

I've got also pParens combinator which is defined like:

pPacked pParenL (pWSpaces *> pParenR)

so it consumes Whitespaces before closing bracket.

Sample input / output

The correct input/output SHOULD be:

in: "(x)"
out: Single "x"

in: "(x )"
out: Single "x"

but I'm getting:

in: "(x)"
out: Single "x"

in: "(x )" 
out: Multi (Single "x") (Single "x")
--  Correcting steps: 
--    Inserted  'x' at position LineColPos 0 3 3 expecting one of ['\t', ' ', 'x']

but in the second example I'm getting error - and the parser behaves like it greedy eats some tokens (and there is no greedy operation).

I would be thankful for any help with it.

Sample code

import Prelude hiding(lex)
import Data.Char hiding (Space)
import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import qualified Text.ParserCombinators.UU.Utils as Utils
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)


data El = Multi El El
        | Single String
        deriving (Show)


---------- Example core grammar ----------

pElement     = Single <$> pSyms "x"
pOperator    = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)

---------- Basic combinators ----------

applyAll x (f:fs) = applyAll (f x) fs
applyAll x []     = x

pSpace    = pSym ' '
pTab      = pSym '\t'
pWSpace   = pSpace <|> pTab
pWSpaces  = pMany pWSpace
pWSpaces1 = pMany1 pWSpace
pMany1 p  = (:) <$> p <*> pMany p

pSyms []       = pReturn []
pSyms (x : xs) = (:) <$> pSym x <*> pSyms xs

pParenL     = Utils.lexeme $ pSym '('
pParenR     = Utils.lexeme $ pSym ')'
pParens     = pPacked pParenL (pWSpaces *> pParenR)

---------- Program ----------

pProgram = pParens pOperator
-- if you replace it with following line, it works:
--  pProgram = pParens pElement
-- so it seems like something in pOperator is greedy

tests = [ ("test", "(x)")
        , ("test", "(x )")
        ]

---------- Helpers ----------

type Parser a = P (Str Char String LineColPos) a

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    mapM_ (\(desc, p) -> putStrLn ("\n=== " ++ desc ++ " ===") >> run pProgram p) tests
    return ()

run :: Show t =>  Parser t -> String -> IO ()
run p inp = do  let (a, errors) =  parse p inp
                putStrLn ("--  Result: \n" ++ show a)
                if null errors then  return ()
                               else  do putStr ("--  Correcting steps: \n")
                                        show_errors errors
                putStrLn "-- "
             where show_errors :: (Show a) => [a] -> IO ()
                   show_errors = sequence_ . (map (putStrLn . show))

IMPORTANT

pOperator    = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)

is equivalent to:

foldr pChainl pElement (Multi <$ pWSpaces1)

according to: Combinator Parsing: A Short Tutorial

And it is used to define operator precedense.

Wojciech Danilo
  • 11,573
  • 17
  • 66
  • 132
  • I don't have a good solution, but your description seems to be exactly what's happening. If I define `let pOperator = applyAll <$> pElement <*> (pMany (flip <$> (Multi <$ pSome pWSpace) <*> pElement) <|> pure [])` I get the expected result, so `pMany` appears to be committing to another match after matching the space. – John L Aug 17 '13 at 23:51
  • @JohnL: That's very strange. Please notice that replacing `pProgram = pParens pOperator` with `pProgram = pParens pElement` gives good result (that of course does not solve the problem either), but it shows, that `pMany` CAN work as expected - can consume no elements. – Wojciech Danilo Aug 17 '13 at 23:55
  • @JohnL: additional the problem cannot be solved with `... <|> pure []`, because it works only for 1 character input, it fails for example for `(x x )`. – Wojciech Danilo Aug 18 '13 at 00:38
  • Ok, the problem "can be theoretically solved" by replacing the library `pMany` combinator with custom one and replacing a lot of library functions (like `pChainl` etc to use our custom `pMany` combinator). This is of course ugly solution, but it works so far. I would love to see the proper one. The custom `pMany` can be declared as follows: `pMany p = (:) <$> p <*> pMany p <|> pure []` – Wojciech Danilo Aug 18 '13 at 00:48
  • Again a work around, but could pMany be replaced with pList_ng? – OllieB Aug 18 '13 at 09:00
  • @OllieB: Yes, this gives exactly the same behavior as my custom `pMany` – Wojciech Danilo Aug 18 '13 at 12:13

1 Answers1

1

The definition of pMany reads:

pMany :: IsParser p => p a -> p [a]
pMany p = pList p

and this suggest the solution. When seeing the space we should not commit immediately to the choice to continue with more x-es so we define:

pMany :: IsParser p => p a -> p [a]
pMany_ng p = pList_ng p

Of course you may also call pList_ng immediately. Even better would be to write:

pParens (pChainr_ng (pMulti <$ pWSpaces1) px) -- 

I did not test it since I am not sure whether between x-es there should be at least one space etc.

Doaitse